Abstract

Image recognition technology is important for improving the accuracy and reliability of medical diagnosis. While image recognition technology has advanced rapidly due to the development of convolutional neural networks, its application in the medical field has not yet reached the level of clinical diagnosis. Therefore, this study focuses on improving the representation of important features and adjusting the classification model for medical images.

The basic approach is based on the Bag of visual words idea, which regards feature points as visual words and infers the meaning of the image. SIFT method is used to extract feature points, and Kmeans is used to cluster the feature points. Furthermore, the histogram of the K clusters to which each feature point belongs is calculated to represent the characteristics of the image. However, since the histogram alone loses the information of the lower layer of the image, we propose a new feature representation method that adds the distribution relationship and correlation relationship between visual words and feature points to the descriptor, while suppressing the dimensionality of the feature vector.

In addition, a neural network is used for image classification based on the image feature vector. The neural network is constructed using Python's TensorFlow. Furthermore, by adjusting the number of layers and neurons in the network, high recognition accuracy was achieved.

Keywords

Image recognition, Medical diagnosis, Bag of visual words, Convolutional neural network, TensorFlow