Efficacy of artificial intelligence-based skin analysis for calculating wrinkle improvement and skin firmness after simultaneous radiofrequency and high-intensity focused ultrasound therapy: a retrospective clinical study

Article information

Arch Aesthetic Plast Surg. 2025;31(2):46-54
Publication date (electronic) : 2025 April 30
doi : https://doi.org/10.14730/aaps.2025.01340
Department of Plastic and Reconstructive Surgery, Dongguk University College of Medicine, Gyeongju, Korea
Correspondence: Hea Kyeong Shin Department of Plastic and Reconstructive Surgery, Dongguk University College of Medicine, 123 Dongdae-ro, Gyeongju 38066, Korea E-mail: shinheakyeong@hanmail.net
Received 2025 March 10; Revised 2025 April 15; Accepted 2025 April 15.

Abstract

Background

Quantitative skin assessments have transitioned from subjective evaluations to objective approaches. However, clinical application has remained limited due to high costs and reliance on specialized equipment. High-intensity focused ultrasound and radiofrequency are the two most widely used noninvasive modalities for skin tightening and wrinkle improvement. This study investigated investigate the efficacy of artificial intelligence (AI)-based skin analysis as a more accessible and cost-effective tool for assessing skin firmness and wrinkle improvement.

Methods

A retrospective analysis was conducted on 34 patients treated simultaneously with high-intensity focused ultrasound and bipolar radiofrequency between January and February 2025. AI-based skin assessments, evaluating firmness and wrinkle scores, were obtained pre-treatment, immediately post-treatment, and at a 2-month follow-up. Standardized clinical photographs were independently evaluated by two blinded human raters. Logistic regression and correlation analyses were conducted to determine alignment between AI and human evaluations.

Results

AI analysis showed significant improvements in both firmness and wrinkle scores immediately after treatment and at the 2-month follow-up (P<0.05). Human evaluations demonstrated high inter-rater agreement (Cohen’s κ=0.72–0.91). Logistic regression analyses indicated that changes in AI scores significantly predicted human-rated treatment effectiveness at both time points (area under the curve [AUC] for firmness=0.86; AUC for wrinkles=0.73–0.93). Spearman correlation coefficients and the Mann-Whitney U test further supported strong alignment between AI and human assessments.

Conclusions

This study validates the clinical utility of AI-based skin analysis as a reliable quantitative measure for evaluating wrinkle improvement and skin tightening following energy-based rejuvenation treatments. Its predictive validity aligns well with expert human judgment, particularly at delayed follow-up.

INTRODUCTION

Facial aging is a complex, multifactorial process characterized by progressive loss of skin elasticity, wrinkle formation, and tissue sagging. These changes result from intrinsic factors, such as chronological aging and genetic predisposition, as well as extrinsic factors, including ultraviolet radiation, pollution, and lifestyle habits [1,2]. The desire to restore a youthful appearance has driven continuous advancements in aesthetic treatments, with surgical interventions traditionally considered the gold standard for addressing significant age-related changes. However, despite their effectiveness, surgical procedures have considerable drawbacks, including high costs, potential complications, prolonged recovery times, and patient reluctance due to their invasive nature. Thus, there is a growing demand for noninvasive and minimally invasive alternatives that provide effective rejuvenation with minimal downtime [3].

Responding to this demand, various energy-based technologies have emerged, among which high-intensity focused ultrasound (HIFU) and radiofrequency (RF) are two of the most widely adopted modalities for nonsurgical skin tightening and collagen remodeling. HIFU uses focused ultrasound waves to penetrate deeply into the dermis and superficial musculoaponeurotic system, generating thermal coagulation points that stimulate neocollagenesis and tissue contraction. This targeted energy delivery allows significant lifting effects without damaging the epidermis [4-6]. RF, meanwhile, employs electromagnetic waves to produce controlled thermal energy within dermal and subdermal layers, inducing collagen denaturation and subsequent remodeling. This process improves skin elasticity and reduces the appearance of fine lines and wrinkles [7,8]. While each modality independently offers substantial skin-tightening benefits, recent advancements have explored their simultaneous application, hypothesizing superior and more comprehensive rejuvenation effects from their combined use [9,10].

Despite their increasing popularity and clinical use, treatment outcomes remain highly variable and subjective. Factors such as patient age, skin type, baseline collagen levels, and individual biological responses contribute to inconsistent results. Traditionally, treatment efficacy has been assessed through clinical observation, patient self-reporting, and photographic documentation. Recently, objective evaluations have been facilitated by advanced facial skin analysis systems, such as Mark-Vu (PSI Plus Corp.) or Morpheus 3D (Morpheus Co., Ltd.), which quantitatively measure skin texture, elasticity, and wrinkle depth [11]. However, these systems are costly, require specialized equipment, and are often restricted to high-end clinics and research facilities, limiting their widespread adoption.

With advancements in artificial intelligence (AI), there is increasing interest in using AI-based skin analysis as a cost-effective and accessible alternative for evaluating treatment outcomes. AI-powered analysis tools leverage deep learning algorithms and image processing techniques to assess skin parameters such as texture, tone, elasticity, and wrinkle severity [12]. These systems have the potential to standardize assessments, reduce inter-observer variability, and provide quantitative metrics comparable to those obtained from specialized imaging systems. However, the reliability and validity of AI-generated scores remain largely unexplored in the context of facial rejuvenation treatments [13].

This study investigated the feasibility of AI-based skin analysis as an objective assessment tool by evaluating its correlation with human evaluation. Specifically, we sought to determine whether AI-generated scores align with clinical assessments, establishing a reliable and scalable method for measuring treatment effectiveness. By validating AI-driven assessments, this research may contribute to broader adoption of AI technologies in aesthetic medicine, enhancing clinical decision-making and patient satisfaction, and increasing access to objective treatment evaluations across diverse practice settings.

METHODS

A retrospective review was conducted using electronic medical records and clinical photographs of patients who underwent simultaneous treatment with HIFU and bipolar RF (V-RO Advance, Hironic Co.) for facial skin sagging and laxity at the Department of Plastic and Reconstructive Surgery, Dongguk University Gyeongju Hospital, Republic of Korea, between January and February 2025. Patients with local skin diseases, connective tissue disorders, or those who had undergone other skin treatments, including energy-based devices, laser therapy, or botulinum toxin injections within 6 months prior to treatment, were excluded. Patients lost to follow-up were also excluded. Clinical photographs and AI skin analysis (Perfect Corp.) scores were obtained pre-treatment, immediately post-treatment, and at a 2-month follow-up. Demographic data (sex, age, and race), underlying diseases, previous energy-based device and cosmetic treatment history, and procedure details (HIFU and bipolar RF parameters, shots) were collected. Treatments followed the manufacturer’s recommended protocol.

Pre-treatment preparation

Topical anesthetic ointment (EMLA, Wells Pharmtech) was applied to the face for 30 minutes before treatment and then thoroughly removed with soap and water immediately prior to the procedure.

Treatment protocol

Ultrasound gel was evenly applied to the skin. The transducer was securely positioned against the targeted skin area and evenly pressed to ensure optimal contact. For the neck, chin, and cheeks, focused linear HIFU transducers of 3.0 mm, 7 MHz and 4.5 mm, 4 MHz were used. Simultaneous pen-type transducers combining HIFU (3.0 mm, 7 MHz and 4.5 mm, 4 MHz) with bipolar RF (2 MHz) were utilized for the neck, chin, cheeks, and mid-face areas. Additionally, a 1.5 mm, 7 MHz HIFU with a 2 MHz bipolar RF transducer was used for periorbital regions, temples, and forehead. Complete facial treatment required approximately 10 to 15 minutes.

Evaluation

The AI facial skin analysis system (Perfect Corporation) iOS application was installed on a mobile phone (iPhone 15, iOS 18.1). The phone was secured on a stand at face level, and patients were seated facing the device on a height-adjustable stool. Photographs were consistently taken from approximately 50 cm distance, using a plain jade-colored background to minimize distractions. Standardized ambient lighting was maintained using ceiling-mounted LED lights at a color temperature of 5,500 K in a windowless room to eliminate variability from natural lighting. Prior to analysis, patients removed makeup, glasses, and face coverings, and wore headbands. With eyes open and a neutral expression, the application automatically detected the patient’s face, assessed adequate lighting, and captured an image using the rear camera without flash. Raw scores were recorded at baseline, immediately after treatment, and at a 2-month follow-up for total wrinkles, firmness, and six specific wrinkle subtypes: “crow’s feet,” “forehead,” “glabellar,” “marionette,” “nasolabial,” and “periocular” (Fig. 1).

Fig. 1.

Artificial intelligence (AI) facial skin analysis application (Perfect Corporation) captures patient photographs and automatically calculates scores for various skin parameters including firmness, wrinkles, eyebags, radiance, spots, texture, dark circles, droopy upper eyelids, pores, droopy lower eyelids, tear trough, acne, redness, moisture, and oiliness.

Independent human evaluators, who were not involved in the treatment or image acquisition, assessed the procedure’s effectiveness. Grayscale pre- and post-treatment photographs from AI analyses were used to minimize selection bias from post-treatment redness. Evaluators assessed treatment efficacy using the Global Aesthetic Improvement Scale, a validated 5-point clinical scale. Each patient’s pre- and post-treatment photographs (immediately post-treatment and at 2-month follow-up) received a score ranging from –1 (worsened), 0 (no change), 1 (improved but additional correction needed), 2 (significant improvement but incomplete correction), to 3 (optimal cosmetic results). For analysis, scores ≥1 were classified as “effective,” while scores ≤0 were “non-effective.” Evaluations were performed blinded and randomized, without access to patient identity or clinical data.

Statistical analysis

All statistical analyses were performed using Python (v3.11). AI skin analysis scores were collected at baseline, immediately after treatment, and at 2-month follow-up. Differences in AI scores (post-treatment minus pre-treatment) for firmness and wrinkles were calculated. Binary classifications from two independent, blinded human evaluators served as the reference standards. Inter-rater agreement between human evaluators was assessed using Cohen’s kappa statistic.

To evaluate alignment between AI scores and human assessments, binary logistic regression analyses were performed, with AI score differences as independent variables and human evaluations (individually or combined using OR logic) as dependent variables. Receiver operating characteristic analysis and area under the curve (AUC) values were calculated to determine predictive performance.

Spearman rank correlation coefficients were computed to assess monotonic relationships between AI improvement scores and human evaluations. Mann-Whitney U tests compared AI score differences between groups categorized as effective versus non-effective by human evaluators. Statistical significance was set at P<0.05 with two-tailed 95% confidence intervals.

RESULTS

A total of 40 patients were initially screened for the study. Three patients were excluded due to having undergone skin-related procedures, including energy-based device treatments, laser therapy, or botulinum toxin injections, within the preceding 6 months. Three additional patients were lost to follow-up, resulting in 34 participants included in the final analysis. Among these participants, five were male and 29 were female. Patient ages ranged from 29 to 68 years, distributed as follows: 20–29 (n=1), 30–39 (n=4), 40–49 (n=9), 50–59 (n=10), and 60 and older (n=10). Nineteen patients (56%) had no previous history of cosmetic procedures. Among the remaining 15 patients, prior treatments included HIFU (n =12), RF (n =4), botulinum toxin injections (n =10), dermal fillers (n = 11), skin boosters (n = 2), and thread lifting (n = 1); some patients had undergone multiple treatments. All prior procedures were performed more than 6 months before study participation (Table 1).

Patients’ characteristics (n=34)

Quantitative assessments of skin firmness and wrinkles were obtained using AI-based skin analysis at three time points: baseline (pre-treatment), immediately after treatment, and at 2-month follow-up. Higher scores indicated better skin status, representing fewer wrinkles and reduced facial sagging. The median AI firmness score improved from 83 (interquartile range [IQR], 76–84) at pre-treatment to 84 (IQR, 82–88) immediately post-treatment and 85 (IQR, 83–88) at 2 months. Median AI wrinkle scores similarly improved from 75 (IQR, 70–77) at pre-treatment to 77 (IQR, 73–78) immediately post-treatment and 78 (IQR, 75–80) at 2 months (Fig. 2). Wilcoxon signed-rank tests showed statistically significant increases in both firmness and wrinkle scores at immediate and 2-month follow-ups compared to baseline (P <0.05), indicating measurable improvements based solely on AI analysis (Table 2). Descriptive statistics for individual wrinkle subtypes across all time points are provided in Supplementary Table 1.

Fig. 2.

Artificial intelligence (AI)-derived scores for firmness and wrinkles across three evaluation points: pre-treatment, immediately post-treatment, and at the 2-month follow-up. Each boxplot illustrates artificial intelligence (AI)-measured changes per time point, with group means indicated by black “X” markers inside each box. Both parameters demonstrated significant improvements sustained through the 2-month follow-up.

Wilcoxon signed-rank test results of AI skin analysis

Two independent blinded human evaluators assessed treatment effectiveness using grayscale photographs. Inter-rater reliability, calculated via Cohen’s kappa (κ), demonstrated substantial to almost perfect agreement across all domains: firmness (κ=0.82 [immediate], κ =0.72 [2-month follow-up]) and wrinkles (κ =0.84 [immediate], κ=0.91 [2-month follow-up]). These results support the consistency and reliability of human assessments (Fig. 3).

Fig. 3.

Confusion matrices illustrating inter-rater agreement between two independent human evaluators assessing treatment effectiveness for firmness and wrinkles at both immediate and 2-month follow-up evaluations. Each cell presents the absolute number of cases and corresponding percentages relative to total evaluations. X- and Y-axes represent binary classifications (effective or non-effective) made by Rater 1 and Rater 2, respectively. High concentration along the diagonal reflects substantial to almost perfect agreement across all domains, confirming the reliability and consistency of human evaluations used for comparison with artificial intelligence (AI)-derived outcomes.

The degree of alignment between AI assessments and human judgments was analyzed using binary logistic regression. The dependent variable was the binary classification of treatment effectiveness (effective vs. non-effective), and the independent variable was the change in AI score (post-treatment minus pre-treatment). Regression models were constructed for each evaluator individually as well as a combined human evaluation, using OR logic, classifying treatment as effective if at least one evaluator rated it as such.

Immediately post-treatment, improvements measured by AI in both firmness and wrinkle scores significantly predicted human-rated effectiveness. For firmness, the logistic regression yielded an odds ratio (OR) of 2.90 (95% confidence interval [CI], 1.26–6.65; P = 0.012) and an AUC of 0.86, indicating strong predictive performance. For wrinkles, the OR was 1.79 (95% CI, 1.00–3.21; P = 0.048) with an AUC of 0.73 (Table 3, Fig. 4A). These results suggest meaningful alignment between AI-derived scores and immediate human clinical perceptions.

Logistic regression model results predicting treatment effectiveness using AI-measured skin improvement scores

Fig. 4.

Logistic regression curves illustrating the relationship between artificial intelligence (AI)-derived score changes and the probability of treatment effectiveness based on combined human evaluations (OR logic). (A) Immediate post-treatment; (B) 2-month follow-up. Dots represent individual human-rated binary outcomes. Solid lines depict fitted logistic regression models. Wrinkle improvement at the 2-month follow-up exhibited the strongest predictive performance.

At the 2-month follow-up, the predictive capability of AI assessments improved significantly. AI firmness score changes strongly predicted human-rated effectiveness (OR, 2.28; 95% CI, 1.94–4.38; P =0.013) with an AUC of 0.86. For wrinkles, the logistic regression showed even stronger predictive power (OR, 5.34; 95% CI, 1.23–9.05; P=0.019) with an AUC of 0.93 (Table 3, Fig. 4B). These findings demonstrate increased alignment between AI assessments and human judgment as clinical improvements become more pronounced over time. Collectively, the results support AI-based skin analysis as a valid and reliable surrogate for human clinical assessment, particularly at delayed follow-up intervals.

To supplement the logistic regression results, Spearman rank correlation analysis assessed the monotonic relationship between AI score changes and binary human evaluations. Statistically significant positive correlations were observed for both firmness and wrinkle assessments at immediate and 2-month follow-ups (correlation coefficients ranging from 0.36 to 0.62; P < 0.05) (Fig. 5), reinforcing directional agreement between AI assessments and clinical impressions.

Fig. 5.

Heatmap displaying Spearman correlation coefficients between artificial intelligence (AI)-derived score differences and human-rated treatment effectiveness at two evaluation points (immediate and 2-month follow-up), for both firmness and wrinkle assessments. Each cell indicates the correlation coefficient (r) and the associated P-value. Positive and statistically significant correlations were observed across all domains, with generally stronger correlations at the 2-month follow-up.

Additionally, Mann-Whitney U tests evaluated whether AI score changes differed significantly between groups classified as effective versus non-effective by human evaluators. AI-measured score changes for firmness were significantly higher in the “effective” group immediately post-treatment (U = 233.5, P = 0.001) and at the 2-month follow-up (U =179.0, P =0.002). Similarly, AI wrinkle scores differed significantly between effective and non-effective groups at both immediate (U=165.0, P=0.039) and 2-month follow-up evaluations (U =155.5, P =0.001) (Fig. 6). These findings reinforce the logistic regression results, confirming AI-measured treatment outcomes are both predictive and statistically distinguishable based on human clinical classifications.

Fig. 6.

Violin plots demonstrating the distribution of artificial intelligence (AI)-derived score changes for firmness and wrinkles, stratified by evaluation time point (immediate and 2-month follow-up) and binary human-rated treatment effectiveness (OR logic). Each panel represents a specific condition, categorizing human evaluations as “non-effective” or “effective.” Violin shapes illustrate the density distribution of AI score changes. Individual patient data points are represented as jittered black “X” markers overlaying each violin. AI score improvements were notably higher in the effective group across all domains, aligning with Mann-Whitney U test outcomes.

DISCUSSION

The findings of this study demonstrate that AI-based skin analysis closely aligns with human evaluations. Our statistical analyses revealed a significant correlation between AI-measured improvements in wrinkle and firmness scores and human evaluations of treatment effectiveness. Additionally, logistic regression analyses indicated that AI-generated firmness scores were robust predictors of human evaluation outcomes. These results suggest that AI-driven assessments can serve as objective and scalable tools for evaluating the effectiveness of noninvasive facial rejuvenation treatments.

The ability of AI to provide quantitative, standardized measurements holds important implications for clinical practice, particularly in the growing field of precision and personalized medicine within aesthetics [14]. Given the inherent subjectivity in clinical evaluations and patient-reported outcomes, the objective and quantifiable metrics provided by AI represent a significant advancement [15].

Several sophisticated 3D imaging systems currently exist on the market, including Mark-Vu (PSI Plus Corp.), Morpheus 3D (Morpheus Co., Ltd.), VECTRA WB360 (Canfield Scientific), and the VISIA Skin Analysis System (Canfield Scientific). These systems offer high-precision evaluations and automated analyses of features such as texture, pigmentation, and vascularity [16]. Moreover, advanced systems combining optical coherence tomography, confocal microscopy, and AI can diagnose and monitor skin cancers such as basal cell carcinoma and melanoma [17-20] .

However, the high cost and limited accessibility of these sophisticated devices hinder their widespread adoption. Conversely, AI-based analysis using standard mobile devices offers an affordable and convenient solution, expanding the availability of objective skin assessments beyond specialized clinics. Studies by Kontzias et al. [21] and Cook et al. [22] demonstrated that AI skin analysis (Perfect Corporation) yields comparable results to traditional facial skin analysis systems like VISIA Skin Analysis (Canfield Scientific). Integrating AI into clinical practice has the potential to standardize outcome measurements, facilitate treatment monitoring, and enhance patient communication through visual and numerical feedback on treatment progress [23].

Despite promising results, this study has several limitations. First, the AI system used in this study relies exclusively on 2D anterior-posterior image analysis, while high-end imaging systems like 3D facial scanners provide more detailed assessments of skin volume and topography. Future research should include comparative analyses between 2D AI-based systems and 3D imaging systems to evaluate their relative accuracy. Second, unlike conventional facial analysis systems that use standardized positioning guides and controlled environments, the AI-based system in this study permits greater flexibility in facial positioning. This flexibility introduces variability in image capture conditions, potentially affecting measurement consistency. Third, the AI analysis system (Perfect Corporation) is a proprietary commercial platform with undisclosed internal algorithms. Although the manufacturer was contacted for technical details, disclosure was restricted due to intellectual property considerations. Thus, specific computational principles, such as feature weighting, decision thresholds, or image preprocessing techniques, remain unknown. Furthermore, automatic software updates and algorithm changes may influence future outputs, limiting reproducibility and transparency in a scientific context. Future validation studies involving different AI-based skin analysis systems, multiple human raters, and larger sample sizes would help establish broader generalizability.

Despite these limitations, this study significantly advances AI-based facial skin evaluation. By demonstrating alignment between AI-generated assessments and human evaluations, this research highlights AI’s potential to reduce subjectivity in treatment outcome measurements. This could be particularly beneficial in clinical trials and routine practice, where standardization of outcome assessments is essential. Additionally, traditional high-end imaging devices require specialized equipment and trained personnel, limiting their broader adoption. AI-based skin analysis using commercially available mobile applications presents an accessible, cost-effective alternative beneficial for both clinicians and patients. As AI technology continues evolving, it may be integrated with predictive modeling tools that optimize treatment protocols based on individual patient characteristics. The ability to monitor treatment responses in real-time could further refine personalized approaches for noninvasive facial rejuvenation.

This study provides a foundational step toward validating AI-based skin analysis as a reliable assessment tool in aesthetic medicine. Future research involving larger datasets, different AI-based platforms, and comparative studies with multiple 3D imaging will be crucial for refining AI-driven evaluation methods. Although AI-based skin analysis is still in its early stages, its ability to provide standardized, quantitative, and objective assessments represents a promising advancement in aesthetic medicine. With further validation and technological improvements, AI-driven evaluation systems could play a pivotal role in improving the precision and accessibility of noninvasive facial rejuvenation treatments.

Notes

No potential conflict of interest relevant to this article was reported.

Ethical approval

The study was approved by the Institutional Review Board of Dongguk University Hospital (IRB approval No. 110757-202503-HR-01-02) and was performed in accordance with the principles of the Declaration of Helsinki. The requirement for written informed consent was waived by the IRB.

Patient consent

The patient provided written informed consent for the publication and use of her images.

Supplemental material

Supplementary materials can be found via https://doi.org/10.14730/aaps.2025.01340.

Supplementary Table 1.

AI-measured wrinkle improvement by subtype at immediate and 2-month follow-up

aaps-2025-01340-supplementary-Table-1.pdf

References

1. Zhang S, Duan E. Fighting against skin aging: the way from bench to bedside. Cell Transplant 2018;27:729–38.
2. Hussein RS, Bin Dayel S, Abahussein O, et al. Influences on skin and intrinsic aging: biological, environmental, and therapeutic insights. J Cosmet Dermatol 2025;24:e16688.
3. Corduff N. Surgical or nonsurgical facial rejuvenation: the patients’ choice. Plast Reconstr Surg Glob Open 2023;11:e5318.
4. Lee IH, Nam SM, Park ES, et al. Evaluation of micro-focused ultrasound for lifting and tightening the face. Arch Aesthetic Plast Surg 2015;21:65–9.
5. Park H, Kim E, Kim J, et al. High-intensity focused ultrasound for the treatment of wrinkles and skin laxity in seven different facial areas. Ann Dermatol 2015;27:688–93.
6. Asiran Serdar Z, Aktas Karabay E, Tatlıparmak A, et al. Efficacy of high-intensity focused ultrasound in facial and neck rejuvenation. J Cosmet Dermatol 2020;19:353–8.
7. Shin JM, Kim JE. Radiofrequency in clinical dermatology. Med Lasers 2013;2:49–57.
8. Dayan E, Burns AJ, Rohrich RJ, et al. The use of radiofrequency in aesthetic surgery. Plast Reconstr Surg Glob Open 2020;8:e2861.
9. Lee SK, Nam SM, Cha HG, et al. Skin tightening effecacy and safety: high-intensity focused ultrasound alone or in combination with monopolar radiofreqeuncy treatment in Republic of Korea: retrospective clinical study. Med Lasers 2023;12:237–42.
10. Byun JW, Kang YR, Park S, et al. Efficacy of radiofrequency combined with single-dot ultrasound efficacy for skin rejuvenation: a non-randomized split-face trial with blinded response evaluation. Skin Res Technol 2023;29:e13452.
11. Singh P, Bornstein MM, Hsung RT, et al. Frontiers in three-dimensional surface imaging systems for 3d face acquisition in craniofacial research and practice: an updated literature review. Diagnostics (Basel) 2024;14:423.
12. Martorell A, Martin-Gorgojo A, Rios-Viñuela E, et al. Artificial intelligence in dermatology: a threat or an opportunity? Actas Dermosifiliogr 2022;113:30–46.
13. Hebel NSD, Boonipat T, Lin J, et al. Artificial intelligence in surgical evaluation: a study of facial rejuvenation techniques. Aesthet Surg J Open Forum 2023;5:ojad032.
14. Seck A, Dee H, Smith W, et al. 3D surface texture analysis of high-resolution normal fields for facial skin condition assessment. Skin Res Technol 2020;26:169–86.
15. Cook MK, Kaszycki MA, Richardson I, et al. Initial validation of a new device for facial skin analysis. J Dermatolog Treat 2022;33:3150–3.
16. Matias AR, Ferreira M, Costa P, et al. Skin colour, skin redness and melanin biometric measurements: comparison study between Antera(®) 3D, Mexameter(®) and Colorimeter(®). Skin Res Technol 2015;21:346–62.
17. Marchetti MA, Nazir ZH, Nanda JK, et al. 3D whole-body skin imaging for automated melanoma detection. J Eur Acad Dermatol Venereol 2023;37:945–50.
18. Jdid R, Pedrazzani M, Lejeune F, et al. Skin dark spot mapping and evaluation of brightening product efficacy using Line-field confocal optical coherence tomography (LC-OCT). Skin Res Technol 2024;30:e13623.
19. Michielon E, Motta AC, Ogien J, et al. Integration of line-field confocal optical coherence tomography and in situ microenvironmental mapping to investigate the living microenvironment of reconstructed human skin and melanoma models. J Dermatol Sci 2024;115:85–93.
20. Chanda T, Hauser K, Hobelsberger S, et al. Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma. Nat Commun 2024;15:524.
21. Kontzias C, Pixley JN, Zaino M, et al. Validation of a new facial skin analysis device across Fitzpatrick skin types. J Cosmet Dermatol 2024;23:720–1.
22. Cook MK, Kaszycki MA, Richardson I, et al. Comparison of two devices for facial skin analysis. J Cosmet Dermatol 2022;21:7001–6.
23. Li Z, Koban KC, Schenck TL, et al. Artificial intelligence in dermatology image analysis: current developments and future trends. J Clin Med 2022;11:6826.

Article information Continued

Fig. 1.

Artificial intelligence (AI) facial skin analysis application (Perfect Corporation) captures patient photographs and automatically calculates scores for various skin parameters including firmness, wrinkles, eyebags, radiance, spots, texture, dark circles, droopy upper eyelids, pores, droopy lower eyelids, tear trough, acne, redness, moisture, and oiliness.

Fig. 2.

Artificial intelligence (AI)-derived scores for firmness and wrinkles across three evaluation points: pre-treatment, immediately post-treatment, and at the 2-month follow-up. Each boxplot illustrates artificial intelligence (AI)-measured changes per time point, with group means indicated by black “X” markers inside each box. Both parameters demonstrated significant improvements sustained through the 2-month follow-up.

Fig. 3.

Confusion matrices illustrating inter-rater agreement between two independent human evaluators assessing treatment effectiveness for firmness and wrinkles at both immediate and 2-month follow-up evaluations. Each cell presents the absolute number of cases and corresponding percentages relative to total evaluations. X- and Y-axes represent binary classifications (effective or non-effective) made by Rater 1 and Rater 2, respectively. High concentration along the diagonal reflects substantial to almost perfect agreement across all domains, confirming the reliability and consistency of human evaluations used for comparison with artificial intelligence (AI)-derived outcomes.

Fig. 4.

Logistic regression curves illustrating the relationship between artificial intelligence (AI)-derived score changes and the probability of treatment effectiveness based on combined human evaluations (OR logic). (A) Immediate post-treatment; (B) 2-month follow-up. Dots represent individual human-rated binary outcomes. Solid lines depict fitted logistic regression models. Wrinkle improvement at the 2-month follow-up exhibited the strongest predictive performance.

Fig. 5.

Heatmap displaying Spearman correlation coefficients between artificial intelligence (AI)-derived score differences and human-rated treatment effectiveness at two evaluation points (immediate and 2-month follow-up), for both firmness and wrinkle assessments. Each cell indicates the correlation coefficient (r) and the associated P-value. Positive and statistically significant correlations were observed across all domains, with generally stronger correlations at the 2-month follow-up.

Fig. 6.

Violin plots demonstrating the distribution of artificial intelligence (AI)-derived score changes for firmness and wrinkles, stratified by evaluation time point (immediate and 2-month follow-up) and binary human-rated treatment effectiveness (OR logic). Each panel represents a specific condition, categorizing human evaluations as “non-effective” or “effective.” Violin shapes illustrate the density distribution of AI score changes. Individual patient data points are represented as jittered black “X” markers overlaying each violin. AI score improvements were notably higher in the effective group across all domains, aligning with Mann-Whitney U test outcomes.

Table 1.

Patients’ characteristics (n=34)

Characteristic No. (%)
Age (yr), range 29–68
Age group distribution (yr)
 20–29 1 (2.9)
 30–39 4 (11.8)
 40–49 9 (26.5)
 50–59 10 (29.4)
 ≥ 60 10 (29.4)
Sex
 Male 5 (14.7)
 Female 29 (85.3)
Previous cosmetic procedures
 None 19 (55.9)
 At least one 15 (44.1)
  High-intensity ultrasound 12 (35.3)
  Radiofrequency 4 (11.8)
  Botulinum toxin 10 (29.4)
  Filler 11 (32.4)
  Skin booster 2 (5.9)
  Thread lifting 1 (2.9)

Table 2.

Wilcoxon signed-rank test results of AI skin analysis

Parameter Comparison Wilcoxon statistic P-value
Firmness Pre vs. Immediate 23.0 < 0.05
Pre vs. 2 mo 25.5 < 0.05
Wrinkles Pre vs. Immediate 13.5 < 0.05
Pre vs. 2 mo 32.0 < 0.05

AI, artificial intelligence; Pre, preoperative.

Table 3.

Logistic regression model results predicting treatment effectiveness using AI-measured skin improvement scores

Parameter Time point β Odds ratio 95% CI P-value AUC
Firmness Immediate 1.06 2.90 1.26–6.65 0.012 0.86
2 Months 0.82 2.28 1.19–4.38 0.013 0.86
Wrinkles Immediate 0.58 1.79 1.00–3.21 0.048 0.73
2 Months 1.68 5.34 1.32–21.56 0.019 0.93

AI, artificial intelligence; CI, confidence interval; AUC, area under the curve.