PROTOTYPE OF MASK RECOGNITION AND BODY TEMPERATURE IN REAL TIME WITH AMG8833 THERMAL CAM SENSOR FOR COVID-19 EARLY WARNING BASED ON MINICOMPUTER

— On April 19


I. INTRODUCTION
In December 2019, the city of Wuhan, Hubei province, one of the largest cities in China with a population of 14 million, was suspected to be the center of a pneumonia outbreak with an unknown cause. After more than seven days on January 7, 2020, Chinese health authorities have confirmed that they have identified the strain of the new coronavirus . On January 30, 2020, the Director-General of WHO gave the final decision on the determination of a Public Health Emergency of International Concern (PHEIC), regarding the outbreak in China [1] - [3]. The 2019 Coronavirus Outbreak, later called COVID-19, is an infectious disease caused by the acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [4]. Typical symptomatic cases of the novel coronavirus disease (COVID- 19) that have been identified can cause sore throat, fever, muscle pains, and cough [5].
In conducting this research, it is very important for researchers to find out and look for reference sources from books, journals, and previous studies. It is done to avoid duplication and plagiarism and can be used as learning materials for researchers to find research novelties so that they continue to develop. Below are some literature studies that have been carried out by the author regarding previous research that has been done related to the themes and methods that will be used.
The previous face detection and recognition approach was based on an artificial conditional network which classification had been trained on top of the identified facial identity data and then a bottleneck layer was used as an intermediate representation used as a generalization of recognition beyond the identity data used in the training. The weakness of this approach is it's the indirectness and inefficiency [6].
Some other studies related to face recognition using PCA have been carried out, and this method is a linear transformation that can be easily studied in one layer of the network. The research is about "Automated Attendance System Using Face Recognition" which was carried out using the Local Binary Pattern Histogram (LBPH) and Histograms of Oriented Gradients (HOG) methods in face detection [7]. There are many ways of preventing the spread of the COVID-19 virus. One of the most effective is to use a mask. Almost everyone wears a face mask at all times when in public during this coronavirus pandemic. In the previously published studies, two advanced object detection models, namely YOLOv3 and faster R-CNN were used to achieve this task. The author has trained both models on a data set consisting of images of people from two categories, the with and without face masks [8]. Another study proposed face mask recognition and standard wear detection algorithms based on improved YOLO-v4. First, an enhanced CSPDarkNet53 was introduced to trunk feature extraction networks, which reduced network computing costs and improved model learning capabilities. Second, the adaptive image scaling algorithm reduced computation and redundancy effectively. Third, an improved PANet structure was introduced so that the network had more semantic information at the feature layer. Finally, a face mask detection data set was created according to the standard for wearing masks. Based on the deep learning object detection algorithm, various evaluation indices were compared to evaluate the effectiveness of the model. The comparison results show that face mask recognition mAP can reach 98.3% and a high frame rate of 54.57 FPS, more accurate than the existing algorithm [9]. In another paper, it is described that the new DeepMasknet framework is capable of detecting face masks and recognizing face masks. Moreover, there is currently no unified and diverse data set that can be used to evaluate face mask detection and masked face recognition. To this end, the authors also developed a large-scale and diverse dataset of mask detection and face recognition (MDMFR) data to measure the performance of face mask detection and masked face recognition methods. Experimental results on multiple datasets including cross-data set setups demonstrate the superiority of our DeepMasknet framework over the contemporary models [10].
Another paper focuses on a new task called masked facial recognition (MFR), which aims to match masked faces to common and important faces, especially during the global COVID-19 outbreak. To address these challenges, this paper collects two data sets designed for MFR: an MFV with 400 pairs of 200 identities for verification, and an MFI containing 4,916 images of 669 identities for identification. As is well known, a robust facial recognition model requires images of millions of identities to train, and hundreds of identities are far from sufficient. Therefore, MFV and MFI are only considered as test data sets to evaluate algorithms. In addition to that, a data augmentation method for training data was introduced to automatically generate synthetic masked face images from existing general face data sets. Besides it, a new latent part detection (LPD) model was proposed to find strong latent facial parts for mask-wearing, and the latent parts were further used to extract discriminatory features. The proposed LPD model is trained endto-end and uses only native and synthetic training data. Experimental results on synthetic masked MFV, MFI, and LFW showed that the LPD model generalized well to both realistic and synthetic masked data and outperformed other methods by a large margin [11]. Another study tried to present a method to generate accurate face mask segmentation from arbitrary size input images. This method used Predefined Training Weights from the VGG-16 Architecture for feature extraction. The training was conducted through Fully Convolutional Networks to semantically group the faces in the image. Gradient Descent was used for training while Binomial Cross-Entropy is used as a loss function. Furthermore, the output image from the FCN is processed to remove unwanted noise and avoid wrong predictions and create a bounding box around the face. Thus, the proposed model also shows good results in recognizing non-frontal faces. Along with this, it can also detect multiple face masks in a single frame. Experiments conducted on the Multi Parsing Human Dataset obtained an average pixel-level accuracy of 93.884% for segmented face masks [12].
In another paper, a new face mask detection framework FMD-Yolo has been created to monitor whether people wear masks in the right way in public, which is an effective way to block the transmission of the virus. In particular, the feature extractor uses Im-Res2Net-101 combined with the Res2Net module and a deep residual network, where the utilization of hierarchical convolution structures, deformable convolutions, and non-local mechanisms allow for the extraction of comprehensive information from the input. After that, the En-PAN-enhanced path aggregation network was applied for feature fusion, in which high-level semantic information and low-level detail were sufficiently combined so that model robustness and generalizability could be improved. Benchmark evaluations were carried out on two public databases with results compared with eight other state-of-the-art detection algorithms. At the level of IoU = 0.5, the proposed FMD-Yolo has achieved the best AP50 precision of 92.0% and 88.4% in the two data sets, and the AP75 at IoU = 0.75 has increased by 5.5% and 3.9% compared to the second, which shows the superiority of PMK-Yolo in face mask detection with both theoretical value and practical significance [13].
From the results of existing research, the authors use the approach of combining temperature sensors and mask detection with MobilenetV2 to obtain better results.

II. METHODS
An image is a digital image in a two-dimensional plane consisting of various pixels. Pixel stands for Picture Elements, which can be understood as a collection of thousands to millions of tiny dots that make up a digital image. The image consists of rectangular grids that are periodically arranged so that the distance between the horizontal and vertical axes of the pixels is the same throughout the image. An image can be called a digital image if the resulting image is the result of digital processing on a camera, computer, scanner, or another electronic device [14].
Digital image processing is an image processing process with the help of a computer so that it becomes an image that has a better quality than before. The purpose of image processing is to improve the quality of an image so that it can be interpreted and identified easily by humans or a machine (computer).

A. Color Image
Color image is an image that has three color channels. Generally, this type of image consists of red, green, and blue components modeled in the RGB color space. RGB is a standard that is used to show color images on television monitors and computer monitors as shown in Figure 1. However, there are also other color images that use a color space other than RGB, such as YCbCr (Luma, Chrome blue, Chrome red), HSV (Hue, Saturation, Value), CMYK (Cyan, Magenta, Yellow, Black), and Lab (L*a*b) [15] [16].  RGB has a value of 0 -255 in each pixel in each channel or has 256 possible values. In one channel, each pixel represents one data with a length of 8 bits. The color image has three kinds of channels, so that in each color image, one pixel represents 24-bit (3  8 bit) data. Hence, color images are also known as 24 bits color images. The color combination of the color image has a possible color variation of 16, 777, 216 (256  256  256 = 224) [17]. For example, a color is written with RGB values (83, 150, 60) so it has values R = 83, G = 150 and B = 60. This color for the human eye tends to be more greenish. Because the value of  0, 0, 0). The color combinations of R, G, and B in the RGB color space and pixel representation are shown in Figures 1 and 2.

B. Grayscale Image
A grayscale image is an image with only one channel. So, a gray image is just an intensity value aka grayscale. Since this type of grayscale image has only one channel, grayscale images require more efficient storage space. This type of grayscale image is also known as an 8-bit image because each pixel value has enough 8-bit memory. In an 8-bit grayscale image, the black-and-white level is divided into 256 degrees of gray (0 -255) where absolute black is denoted by a value of 0 and absolute white is represented by a value of 255 [15].

C. Face Recognition
Face recognition is a part of a biometric technique useful in facial recognition. Facial recognition is commonly used in online presence systems, unlocking screens on mobile devices. The purpose of the facial recognition system is to make it easier to use, reduce the occurrence of errors and be safer than when the system is run manually. In identifying individuals, facial recognition is a developing technology, widely used by researchers and industry as research material. The facial recognition system includes facial image acquisition, facial recognition, extraction, and the matching of facial features [18] [19].

D. Hardware
A microcontroller is a chip that functions as an electronic circuit controller and in general, can store programs in it, generally, a microcontroller consists of a CPU (Central Processing Unit), memory storage, certain input/output, and other supporting units such as Analog-to-Digital Converter (CPU ADC). A microcontroller can also be defined as a digital electronic circuit that has input and output as well as a control system that can be written and erased with a program. The microcontroller also has the main advantage that there is RAM and supporting input/output so that the size of the microcontroller becomes very compact [20].
Based on the architecture, a microcontroller is divided into two, namely: CISC (Complex Instruction Set Computer) and RISC. CISC type is a microcontroller that has a number of complex and complete instructions for example MCS51 from ATMEL, Intel 80C51 (MCS51), and Motorola.

Figure 3. MCS51 Microcontroller
RISC microcontroller is another type of microcontroller which has a limited number of instructions and also a few. Compared to the CISC type, the RISC type microcontroller has fewer instructions but has more registers. RISC instructions are not only executed in one clock cycle per instruction but also because of the simpler memory addressing mode [21].

E. ARM Microcontroller
ARM made by ARM Holding is a type of microcontroller that has a 32-bit and 64-bit processor architecture. ARM Holdin gives licenses to various companies so that they can be mass-produced, these companies include Samsung, Nvidia, Nuvoton, AMD, Freescale, NXP, Atmel, TI, and ST Micro [22]. Currently, ARM processors are becoming very popular because this processor architecture is used in the

F. Sensor thermal cam AMG8833
The AMG8833 Thermal Cam Sensor developed by Panasonic has an 8x8 array of IR thermal sensors. When connected to a microcontroller (or raspberry Pi) it will provide feedback an array of 64 infrared temperature readings via the I2C line. Way like a thermal camera that has high specifications, but is more compact and simpler enough to integrate with Raspberry. It will measure temperatures from 0°C to 80°C (32°F to 176°F) with an accuracy of ±2.5°C (4.5°F). The AMG8833 can detect humans from up to 7 meters (23) feet away. With a maximum frame rate of 10 Hz. The AMG8833 is an improved generation of Panasonic's previous 8x8 thermal IR sensor which provides higher performance than its predecessor AMG8831. This sensor supports only I2C lines and has a configurable interrupt pin that can fire when each pixel goes above or below a predefined threshold.

A. AMG8833 Thermal Camera Sensor Testing
The testing was done by comparing a digital thermometer. The results of the comparison of sensors carried out with digital thermometers can be shown in Table 1. From Table 1 the results show that the comparison of measurements on the AMG8833 sensor with a digital thermometer has an average deviation or error of 0.10C or 0.28%. The error is considered reasonable because it will not affect the overall system performance. However, in this study, the AMG8833 sensor needs to be calibrated.

B. Mask Detection Test Results
The camera as an image capture is used as input on the raspberry pi. Before generating an image that can detect masks, a training process is needed first by utilizing data on 100 face images without using a mask and 100 face images using a mask. The author uses the detection method HOG (Histogram of Oriented Gradients). This training takes place over a long period of time. The dataset is trained in Deep Learning using the MobileNetV2 algorithm, which is the latest version of MobileNet in convolution artificial neural network (CNN) architecture and MobileNet will divide the convolution into pointwise convolution and depthwise convolution. This method is considered suitable because it has a small memory so that it can be applied to the Raspberry Pi is not too heavy. The data training process takes a long time, depending on how good the device used for this data training process is. The display of the process during training is shown in Figure 5.

Figure 5. Data training process
From the dataset training process in detecting COVID-19 masks, it shows fairly high accuracy with the existing dataset, from these results it can be used as a model that can be used for the Raspberry Pi. The graph of the results of the training process for the detector mask dataset is shown in Figure 6. From the graph in Figure 6, it can be seen that the validation accuracy reaches 99.28% while the training accuracy reaches 100% while the result of the validation loss is 0.02 and the training loss is 0.00018. These results were obtained after an iteration of 20 epochs. Epoch is a parameter for all datasets that have undergone deep learning architecture model training in one full cycle and if 20 epochs are generated, it means the entire dataset will be trained 20 times.

C. Mask Detection Test Results
The results of the mask detection test were made by using several types of masks that are commonly used by the public, this aims to find out that all variations of masks on the prototype can work optimally if recorded visitors use different masks.
The test results from 10 trials show that the mask detection prototype can detect various types of masks with 100% success without any failure, except with masks that have a picture of a nose and mouth or resemble a human face because the image can obscure the shape of the mask as shown in Figure 7 whenever there was a movement to change the position of the face during testing there is an error condition. This is due to this is due to the limitations of data processing on the Raspberry Pi because when it detects a face there is a decrease in the frame rate on the camera so that if there is movement that exceeds the frame rate threshold, an error will occur. In addition, detection errors will occur if the user uses a mask with a texture resembling a nose and mouth. This creates ambiguity in the system and it is assumed that users do not use masks, therefore the use of masks that resemble faces is not allowed.  140 of the mask covered by the hand is not too large, the system can still read the face and the mask, if the area covered is large, the system will detect that the mask is not being used despite the fact that it is otherwise.
Detection error happened because the hand covered the contour of the mask on the nose, so the mask was not detected. If the masking area is wide but only in the middle, it is possible that the mask can still be detected. This system can also detect photos of faces wearing masks since in this study the authors did not use object detection methods that can distinguish real face objects or photos using masks.

D. Effectivity Distance Test Result
This test aims to determine the effective distance of the mask detector whether it has been integrated with the AMG 8833 Thermal Cam sensor, both when it is still and detects people who are walking in the direction of the camera.   Table 2 are the distances generated from the mask detection and body temperature at different effective distances. In testing masks without using the AMG8833 Thermal Cam sensor, by testing from a distance of five centimeters, there happened that after a few seconds backwards little by little the system could not read faces and masks. For the test results, the furthest distance was 5 meters and the closest was 5 centimeters.
Then, in conditions with the integration of the AMG8833 Thermal Cam sensor, the detection was carried out from a distance of 5 meters to the closest point to the sensor which has a different effective distance. The results of these tests indicate that the mask detection distance became shorter. The farthest distance was 30 centimeters and the closest distance was 5 centimeters.
This happened because the AMG8833 temperature sensor had a limited distance and tends to be biased when it detects temperatures outside the range. The Raspberry Pi4 used had a weakness that when the system had been integrated with the sensor, it experienced a decrease in processing changes in information from the camera and the detection process that requires processing from many cores, this was because the GPU on the Raspberry Pi had a small capacity.
The effective distance of the mask detection system was affected by various factors, such as the type of temperature sensor used, light conditions, the type of mask, and the hardware of the system, namely the Raspberry Pi and the camera with a resolution of only 720 pixels.

IV. CONCLUSION
Based on research that has been carried out, the trials showed that the AMG8833 Thermal Cam sensor could read effectively at a maximum distance of 30 cm, the deviation level of temperature readings was 0,1C, the algorithm used to detect masks and body temperature was very effective and could work as expected. The deep learning method using MobileNetV2 applied to the Raspberry Pi as a system processing could work well and had a mask detection accuracy rate of up to 99%.