Two Phased Histogram of Oriented Gradient Feature Selection Strategy for Face Recognition

ABSTRACT


I. INTRODUCTION
A reasonable face recognition framework needs to work under various imaging conditions, for example, unique face presents, and distinctive illumination conditions. Face appearance can change significantly because of brightening changes, camera accuracy, pose or expression (Adini et. al, 1997,pp 721-732). A common approach to overcoming image variations is to use image representations that are insensitive to these variations. However, these variations play part of the problem as features used in recognition have greater effect on the final recognition/classification results.
Feature extraction is one of the phases of classification process in which the generated features are used in the selection and classification tasks (Gehler and Nowozin, 2009, pp 221-228). Extracted features (or feature vector) should be done carefully to increase the classification rate of a specific object/person in an image (Doll'ar et.al., 2012,pp 743-761;Lin and Davis, 2010, pp 604-618). One of the most successful image descriptors used for face or object recognition/classification during recent years are histogram of oriented gradients (HOG) (Dalal and Triggs, 2005,pp 886-893). HOG features are now widely used in any object/person recognition and detection. It describes any body shape via edge directions or gradient directions in the window. Each window region is divided into 64 blocks with each block having 32*32 in dimensions. A histogram of oriented gradients is found for every cell in these blocks. The final descriptor is found by combining all block features in a window to construct the final HOG feature vector (Daniz et al, 2011(Daniz et al, , pp 1698(Daniz et al, -1603. The use of such method can have a drawback when used for face recognition as some of the produced features can be similar among multiple class. Moreover, the size of the used window has a great and direct effect on the recognition result. Facial features in facial classification have been used widely as an important technique for the biometrics security. These days, an individual viewed in the video outline or computerized picture can be naturally distinguished by means of Facial Recognition System (FRS) (Luaibi and Mohammed, 2019 882 an individual recognition application is a distinctive and valued trait. This characteristic has shown to propose better outcome over other biometrics. Practically, all other biometric requires specific action by individuals. For instance, a person must put their finger on the finger scanner for fingerprinting and remain in a fixed location before a scanner for iris or retina identifiable proof. All the same, it is feasible to employ facial recognition passively without expressing any activity as face images can be obtained by the camera from a distance. Vice versa, lights, the person's poses, low resolution, and illumination deviation are several disadvantages of face recognition. Occasionally, a person face may be blurry or undetectable that may affect the correctness of feature extraction process. Another fact in face recognition system is the high dependability on feature extraction efficiency to work in a proper manner. For this fact, many scholars have attempt to overcome this problem by introducing various feature extraction techniques. Most of these techniques impose extracting features as vector from feature descriptors. It proves to be successful to some point, but still suffer from similarity in cases where images fall under a certain variation (Kak et. al., 2018,pp 157-168).
In this paper, HOG features are utilized in a way that extracting small number of features with large window size is used to divided classes into smaller group based on their similarity. Then, classes in these groups are classified based on larger number of features with better (smaller) window size. The proposed method was tested using two face recognition system with SVM and NB classifiers.

II. RELATED WORKS
The are many recent studies on various face recognition techniques. Basically, most of the differences between these works are restricted to the nature of the feature extraction methods. The most related to our work studies are briefly described below. Jain et. al. (2013, pp 595-599) presented a face recognition system that uses a combination of wavelet transform, Principle component analysis, and neural network. The wavelet transform can be employed to investigate an image into diverse frequency components under diverse scales of resolution. These represent 883 decomposition of the image in four sub bands (LL, LH, HL, HH). LL represents the approximation contents, and the other sub bands represent details (edges) for the images, Principle Component Analysis (PCA) is applied to LL to extract the features. These features are utilized to sequence the classifier based on artificial neural networks. The functioning of the classifier is influenced by 3 the recognition rate for diverse training and test data set. Al-Arashi et al. (2014, pp 415-420) improved the system performance by integrating PCA with a genetic algorithm (GA) to find the finest core distribution of the training data that is more appropriate for classification. The resultant accuracy and classification time in this study were better than those of PCA if used by itself. Julina et al. (2017) proposed a method used to identify if a specified face input image relates to a stored person details in the dataset. Face recognition is implemented by utilizing Histogram Oriented Gradients (HOG) features in AT & T dataset. The feature vectors were produced via the HOG descriptor to be sequenced by Support Vector Machines (SVM), and the fallouts were demonstrated with respect to a given test input. The projected technique investigates if a tested image in diverse lighting and pose conditions matches properly the images from the facial dataset. The training set consists of 369 images with a tested set of 41 images. Each picture in the preparation set is tested by HOG features. An overall of 369 feature vectors have been acquired with a dimensionality about 4680 for every image in the training dataset with the detection rate of 90.2 %. Tousif et al. (2018) proposed a method using the Neural Network as a classifier for face recognition. The proposed system utilized the Levenberg-Marquardt feed-forward training technique of the neural network. This technique is used to extract features which are produced by utilizing the Histogram Oriented Gradients features (HOG) of the input images. In that proposed system of implemented supervised learning with the feed-forward neural network, the signal is only in single direction with an adopted Olivetti Research Laboratory (ORL) dataset that has the size of 112x92 and is preprocessed to 27x18. The training images were given to the HOG to employ the Naik and Lad (2015, pp2278-2281), generated side view image from 2D face image by creating mirror image. For the identification stage, the Viola and Jones calculation has

Vol. (5), No (4), Winter 2020 ISSN 2518-6566 (Online) -ISSN 2518-6558 (Print)
884 been utilized and Local Binary Pattern for feature enhancement was used. Kumari and Rajesh (2016, pp 581-589 proposed facial expressions recognizer system. For area of interest discovery, the Viola and Jones algorithm was utilized. In addition, for feature extraction Histogram Oriented Gradients (HOG) was utilized. Li et. al. (2004 pp 413-427), for the face recognition, used Support Vector Machine (SVM) and Eigen faces algorithms for the front view image, and Support Vector Regression implemented the Pose estimator for face detection. Figure 1 shows the proposed system for face recognition. The system consists of five phases which are preprocessing, feature extraction for group classification, group classifier, feature extraction for class classification and class classifier. The following areas clarify each stage in subtleties.

Preprocessing
The preprocessing phase contains two processes which are face extraction and face enhancement. In face extraction, the face image is extracted from the image by excluding the background and other body parts like the neck based on (Du and Ward, 2005, pp 948-954). The image is transformed into gray scale image and an illumination enhancement is applied to remove irregularity in face dark and light spots based on wavelet transformation (Shin et. al., 2008,pp 515-534). Figure 2 below shows the original images before and after preprocessing phase.

HOG Feature Extraction
HOG feature is one of the most used descriptors in classification systems. it calculates the gradient magnitude and the gradient direction of an input image (Dalal and Triggs, 2005,pp 886-893). It has shown significant success in object detection and recognition

Vol. (5), No (4), Winter 2020 ISSN 2518-6566 (Online) -ISSN 2518-6558 (Print)
886 compared to other descriptors. The main idea behind HOG features is that object can be characterized by the distribution of edge directions. HOG is implemented by dividing the image window into 64 blocks where each block is composed of 2*2 cells. Then, histogram of gradient direction or edge orientation is calculated for each pixel in a block cell. The combined histogram entries form the descriptor blocks is referred to as Histogram of Oriented Gradient (HOG) descriptors. The following sub-section describes each step-in obtaining HOG feature vector, as well as its use in this study proposed system.

Gradient Computation
There are two steps must be done to compute the gradient computation. In the first step, the centered mask is calculated. The most common method to compute centered mask is to apply 1-D centered mask value along horizontal and vertical directions. This step is done to smooth the color or intensity data of the image. In the second step the gradient angle and gradient magnitude for each pixel in a cell is found.

Orientation Binning
The orientation binning involves creating the cell histograms. After gradient computation for each pixel in a cell is done, magnitude value is assigned to bin ranging from 0-180 degrees. Higher magnitude values are considered as a part of edge directions and lighter values are discarded. The gradient used in conjunction with 9 histogram channels performs best in the human face detection.

Descriptor Blocks
Features are extracted from each cell, and cells are concatenated to each other to construct a block descriptor. The final descriptor is obtained by the concatenation of all the blocks features in the window.

887
HOG extracted features number depends on the size of the block window. Using big size window will give a fewer number of features which will result in greater similarity among classes. This characteristic is used to separate the large number of classes into smaller number of groups based on similarity among them. These smaller groups features will then be used for group classification. Then after, the selected group from group classifier will undergo another HOG feature extraction with larger number of features which impose dissimilarity between face classes (gray area in Fig. 3). The final HOG extracted features will be used for class classification.

Classification
Two types of classifiers were used in this research which are SVM and NB. The purpose of using multiple classifiers is to give the proposed methodology for feature extraction more credibility when it is successfully working on multiple classifiers. The following section explain each classifier in detail.

Vol. (5), No (4), Winter 2020 ISSN 2518-6566 (Online) -ISSN 2518-6558 (Print)
888 SVM classifier is one on of the classifiers that is a part of edge classifiers. They perform separation between two classes by finding a specific surface that has greatest separation to the nearest centers in the training set which are named support vectors. Expecting straightly divisible data, the objective is to isolate the two classes by a hyperplane with the end goal that the separation between two or more classes is maximized to the support vectors. There are two basic strategies for solving n -class problems with SVMs. First, in the one-vs-all approach n SVMs are trained. Every one of the SVMs isolates a solitary class from every excess class. Second, in the pairwise approach n(n-1)/2 machines are trained. Each SVM separates a pair of classes. The pairwise classifiers are organized in trees, where each tree node is a SVM. A bottomup tree similar to the elimination tree used in tennis tournaments (Amrendra.P.Singh, 2019, p. 11).
The arrangement of descriptors (105 x 36 = 3780 values) is utilized to take care of the SVM classifier, which creates a model (a set of support arrays). During the decision stage, the descriptors are determined in an indistinguishable way as in the learning stage. Decision making, regarding the class membership is made directly by the decision function of SVM (M. Kachouane, 2012, p. 5). For our study, sequential minimal optimization SVM is used for classification.

Naïve Bayesian Classifier
Naive Bayes is a classifier method which is introduced by Thomas Bayes. This method learning from data and predict class which each class have probability. Bayes theorem is shown in Eq. (G.I. Web,2010).
where P(A) and P(B) are probabilities of observing A and B . P(B|A) is the probability of observing event B given that A is true.
Naïve Bayes equation is represented by P(A| B), A is a input vector that have feature and B is a class label. Based on information from training data, for each combination A and B, the final probability P(B|A) of model should be trained. With that model,
where P(B|A) is probability data for A vector in Y class. P(Y) is initial probability of Y class. ∏ ( | ) =1 is independent probability B class from all features in A vector. Value of P(A) is always a fixed value so in the next calculation, it just needs to calculate (B) to select the max value of selected class as the result of prediction. Meanwhile, independent probability is an influence of all features from the data for each B class.

IV. RESULTS AND DISCUSSION
To test the proposed method underhand, two face datasets where used, ORL and Essex face data sets. The ORL dataset (know also as AT&T dataset) consists of 40 subjects with each having 10 different poses or face images (Samaria, F.;Harter, A., 1994). The Essex face94 dataset consist of 20 male and 100 female face images with each subject having 20 different face poses and effects (Essex face dataset,1994). The result obtained for both image datasets are compared with research in the same context using both HOG with SVM or NB classifiers. Table 1 shows the results obtained for both classification on the proposed method given both the group and class classification results for both datasets. The table results are based on 10 trail average with 66% of the dataset used for training and 34% for testing with different testing samples at each trail. This approach will guarantee that, the obtained results are not fixed sample-based testing. From the table, it shows the effect of group classification score of 99.5% and 98% for SVM and NB classifier respectively for ORL and 99% and 97.5% for SVM and NB classifiers for Face94 dataset. This is very important as there will be less classification error when decision is made on fewer number of classes than the whole class scope. As shown in the table, the final classification rate of 98.6% and 97% for SVM and NB respectively for ORL dataset. Moreover, the method scored 98% and 96.5% classification rate for

Vol. (5), No (4), Winter 2020 ISSN 2518-6566 (Online) -ISSN 2518-6558 (Print)
890 both SVM and NB classifiers, respectively. These high score rates came under direct effect of using group classifier rather than direct classification which in the end will have a higher probability of selecting the right class. The results in Table 1 are compared to other methods which uses HOG with both SVM and NB algorithms. First, in comparing the proposed method with HOG and SVM, many researches considers various methodology to utilize HOG with SVM, such as Singh and Kumar (2016, pp. 289-293) and Dadi et al. (2016, pp 34-44). Table 2 shows the comparison in results when applying these methods to the same datasets.

891
Vazquez (2016, pp . 35-44). The results shows a significant difference in result with the proposed method on both dataset with 11% and 10% better classification rate.

V. CONCLUSION
Feature extraction and selection process is the most important phase in pattern classification methods. The process depends on extracting important and significant information from a collection of images for easier classification. Many scholars over the past decades tried to investigate different approaches to enhance the recognition system, however, these researches still depend on single phase for feature extraction and classification. In this paper, feature extraction and classification are done in two phases. Feature extraction is applied for a set of groups of classes where there is similarity among number of classes with low number of HOG features. After a group is recognized based on these features, another feature extraction phase is applied to extract a set of feature vector to find the class with most likelihood. This method was applied to NB and SVM classifiers. The results showed a better performance to compared to studies used HOG features for face classification. The results had a better classification rate between 1% and 4.5% compared to HOG and SVM methods, and around 10% better classification compared to HOG and NB method. Therefore, it can be concluded that, Varying window size in HOG features can be used to divide large group of classes into smaller subgroups, thus make it easier and more conclusive to recognize correct classes.