Image captioning is a process of automatically generating descriptive sentences for a given image. Text-to-image search is a form of search in which images are retrieved based on matching keywords… Click to show full abstract
Image captioning is a process of automatically generating descriptive sentences for a given image. Text-to-image search is a form of search in which images are retrieved based on matching keywords and image features. We focus on the case in which multiple description sentences are generated for one image. In this study, we used four learning models: 1) a discriminator, which is a binary classifier that distinguishes skin from background using image segmentation; 2) an autoencoder; 3) a multiclass classification model combining the features from the discriminator and autoencoder and producing keyword labels; and 4) a Siamese network learning the textual similarity matching between colloquial description sentences of skin imaging pathology and keywords produced from the multi-class classifier. The experimental results show that the proposed method yields an accuracy of up to 99% for the testing data in terms of colloquial language of skin images. This study enabled users to read the skin. For teaching research on skin diagnosis, the proposed method can significantly relieve the shortage of training personnel and assist hospitals that lack resources for conducting case studies. The results of this study are expected to be feasible and can be applied in actual clinical teaching. For medical education in dermatology, the findings of this study contribute to the practical value of quantitative indicators and assessments for learning outcomes of medical students.
               
Click one of the above tabs to view related content.