Accuracy Assessment of Unsupervised Land Use and Land Cover Classification Using Remote Sensing and Geographical Information Systems

: A significant tool for determining how accurate a categorization product is the accuracy assessment. Remote Sensing is one of the essential tools for compiling land use and land cover maps through image classification. The availability of high-quality Landsat imagery and secondary data, an accurate classification technique, and the user's knowledge and competence with the procedures essential to the image classification process. Assess the satellite image classification suitability for further mapping and analysis through accuracy assessments. This paper examined land use and land cover classification using unsupervised classification and extracted NDBI and NDVI further to support main land use and land cover types in the area. The accuracy of classifications was assessed using an error matrix and Kappa statistics. Land use and land cover, NDBI, and NDVI classification accuracies are almost perfect, further verified by the Kappa statistics tool. An excellent unsupervised classification of land use and land cover classes was generated. Accuracy assessment evaluation is one of the most significant tools for determining a classification product's accuracy. The confusion error matrix and Kappa coefficient were particularly useful in calculating accuracy assessment. Typically, accuracy measures of the unsupervised classification show a moderate accuracy level. This study observed almost perfect agreement in all types of accuracy measures. This study is an important source of information that planners and decision-makers may utilize to plan the environment sustainably.


Introduction
Land use and land cover changes are global environmental phenomena requiring regular monitoring to detect the changes and identify vulnerabilities to arrange necessary precautions to minimize or control land degradation. Using satellite remote sensing data is the most appropriate and time-saving data source in land use and land cover change detection [1]. Image classification using a digital image sort out all pixels in the image into a finite number of individual classes. Ultimately the classified image is a thematic map of the original image. Classification is either supervised or unsupervised. Supervised classification methods need field awareness to produce a better classification. This generally results in more accurate class definitions and higher accuracy. In the image, unsupervised categorization clusters related classes. Clustering uses techniques based on spectral signatures to generate spectral classes, with each spectral class being allocated to a ground class. Various statistical techniques are linked to clustering procedures. Accuracy assessment is a significant phase in satellite imagery classification. Accuracy assessment of remote sensing products is a feedback system for checking and evaluating the objectives and the results [2]. Accuracy is the measure of the agreement between a standard assumed to be a correct and classified image of unknown quality. If the classified image corresponds closely with the standards, it is said to be accurate. The accuracy of spatial data has been defined by the United States Geological Survey (USGS) as: "The closeness of results of observations, computations, or estimates to the true values or the values accepted as being true" [3]. However, it must be stated that "truth" has a certain subjective dimension [4]. Users with diverse applications should be able to assess whether the accuracy of the map suits their objectives or not [5]. Hence error matrices, also known as confusion or contingency matrices, have become a broadly accepted method to report the error of raster data.
Various methods have been developed to evaluate these error matrices. Non-statistical approaches are included, such as those based on agreement coefficients and those based on binomial distribution. Although these methods provide a strong tool for evaluating error matrices, they all make assumptions about how the data for the error matrices are collected. It is also expected that the misclassification of a particular area may be determined without ambiguity [6]. The error matrix must reflect the full area mapped using remotely sensed data, which is the overarching premise of the overall accuracy evaluation approach [7]. The question is whether the right sample strategy was employed, which will provide the foundation for future investigations. If this assumption is violated, the accuracy assessment's results are nullified. As a result, the error matrix and the entire data-gathering technique must be assessed for accuracy. According to Congalton [8], the following considerations should be considered: Discuss the origins of errors, sampling scheme, sample scheme, sample number, and sample unit (ground data collection and sample size). Each of these criteria contributes to the accuracy assessment's overall quality. There are numerous viewpoints on how to judge correctness. The approach should be chosen following the investigation's specific goals and criteria.
A classification error matrix is typically formed in evaluating classification errors. In this table, classification results are given as rows, and reference or verification is given as columns for each sample point. The diagonal elements in the matrix indicate the numbers of samples for which the classification results agree with the reference data. The matrix contains complete information on categorical accuracy. Off-diagonal elements in each row present the numbers of a sample that the classifier has misclassified. This is called a commission error. The off-diagonal elements in each column are those samples being omitted by the classifier. Hence, this is called an omission error. To summarize the classification results, overall accuracy is the most commonly used accuracy measure.
For further individual category accuracy assessment tasks, more specific measures are needed than overall accuracy, as overall accuracy does not indicate how the accuracy is distributed across the individual categories. Examining the confusion matrix allows the user and producer accuracy to be determined. Integrator reliability is another measure needed to assess the accuracy assessment. This generally involved Kappa analysis, a discrete multivariate technique used in accuracy assessment [9]. Kappa analysis yields a Khat statistic, an estimate of Kappa, a measure of agreement or accuracy.
This research aims to assess satellite image classification suitability for further analysis. Thus, classify satellite images using unsupervised classification and mapping land use and land cover of the study area using remote sensing and Geographical Information System techniques and perform an accuracy assessment to find out how accurate the classification procedures are and interpret the applicability of the classification for further land mapping. The study area lies within the Colombo district, in the vicinity of Kesbewa town. The total area of the study area is approximately 62 km2. The terrain is generally flat, and the maximum elevation is 35 meters. The area is rapidly urbanized and comprises various land use and land cover types. Paddy cultivation and a home garden are significant in the urban and building sectors. Using 141/55 path/row, the satellite image of the area was downloaded via USGS earth explorer. Landsat 8 with 30 x 30 resolution image captured 2020 used WGS 1984 UTM zone 44N projection.

Research Object
The image was subjected to atmospheric and geometric correction with Erdas Imagine version 14. The enhanced image was classified using unsupervised classification by Arc GIS version 10.4, and finally, accuracy assessment was done based on the error matrix and Kappa coefficient. A digital elevation model for the area was also created. Five classes of land use and land cover types were mainly identified. They are built-up home gardens, paddy, open spaces, and water bodies. The classified image grabs a significant portion of the total image for vegetation cover and building density. Thus, vegetation cover and building density were separately mapped using NDVI (Normalized Difference Vegetation Index) and NDBI (Normalized Difference Building Index) spectral indices.

NDVI and NDBI
The values of NDVI indicate high leaf biomass, canopy closure, or leaf area [10]. The ease of calculating NDVI from satellite data and the success of detecting vegetation and interpretation have made this one of the most widely used and popular spectral vegetation indices [11]. NDVI values range from -1.0 to +1.0, whereas very low values of NDVI (-0.1 and below) correspond to barren rock, sand, or urban/built-up areas. Zero indicates the water cover. Moderate values (0.1-0.3) represent the low density of vegetation, and high values (0.6-0.8) indicate dense vegetation [12], while 0.9-1.0 indicates heavily dense vegetation or the highest possible green vegetation [13], [14].
It extracted built-up features with indices ranging from -1 to 1. It represents the density of the build-up area on the land surface from the ratio between the difference and the sum of the satellite imagery's near-infrared and SWIR-refracted radiation [15] using the following equation.
The higher value indicates the density of build-up land or the urban area or developed area, and the lower value indicates the less build-up, rural area, or undeveloped area [16].
where: = overall accuracy in % nc = total number of classes eii = element in i th row and i th column NT = total number of samples Eij = element in i th row and j th column Where N is the total number of sites in the matrix, r is the number of rows, xii is the number in row i and column i, x+i is the total for row i, and xi+ is the total for a column. The categorization of Kappa statistics is widely referenced; however, this study uses a reproduced categorization (Table 1) as per a recent study [17]. A Kappa coefficient equal to 1 means perfect agreement, whereas a value close to zero means that the agreement is no better than would be expected by chance.

Result and Discussions
Almost all larger extents of home gardens have been converted to open lands. In turn, these areas could be identified and built up in 2020. Previous land use and land cover were studied using the Google Earth Engine. The surrounding lake environment also demonstrates patches of home gardens, built-up areas, and open spaces than in previous years. Significant agglomeration of built-up could be identified in the study area's north-western and central parts. The home garden extents have narrowed to minor belts around built-up areas. Western borders are highly classified as water bodies where a part of the Bolgoda lake and Bolgoda river provide water for agricultural purposes adjacent to the water bodies.   Table  2). Only 21.51 km2 is classified as having no or low building density, which is truly paddy lands and water bodies. Although high building density was observed, the maximum value; of 0.18 is not much closer to +1, indicating a significance in building density. NDVI classification further showed very high vegetation cover limited to 21.08% (12.95 km2) of paddy land area. Another 36.98% (22.71 km2) was classified are home gardens mixed with build-up areas ( Table 2). A linear pattern of low or no vegetation cover aligns with the area's transportation lines. Since the NDVI value is not reaching +1, a moderate vegetation cover could be observed. The lowest NDVI is -0.18, and the maximum is 0.52. Land use and land cover map and NDBI and NDVI maps clearly show the existing land features. Table 3 shows the relationship between random sample pixel observations and corresponding classified data obtained through the error matrix report. Accordingly, Rwanga & Ndambuki, overall accuracy is 96 for the land use and land cover classification. The overall accuracy equals the number of correct points divided by the total number of points ((85/100) × 100). Table 3 shows the relationship between random sample pixel observations and corresponding classified data obtained through the error matrix report. Accordingly, Rwanga & Ndambuki, overall accuracy is 96 for the land use and land cover classification. The overall accuracy equals the number of correct points divided by the total number of points ((85/100) × 100). The columns of the theoretical confusion matrix of land use and land cover classification show which classes the pixels belong in the validation set and the rows show which classes the image pixels have been assigned in the image [17]. The diagonal shows the pixels that are classified correctly. Pixels not assigned to the proper class do not occur in the diagonal and indicate confusion between the different land cover classes in the class assignment. The off-diagonal elements in the rows of the confusion matrix, divided by the total number of pixels assigned to the Landsat image class corresponding to the row, represent the commission errors and describe the confusion between that image class and describe the other land cover classes.
The commission errors describe the chance that a pixel assigned to a particular class belongs to one of the other classes [17]. The omission error refers to reference sites that were left out or omitted from the correct class in the classified map. The real land cover type was omitted from the classified map. An error of omission is sometimes called a Type I error. The producer accuracy indicator also describes the number of errors of commission.
Commission error (overestimation) and producer accuracy values are connected [18]; User accuracy is another index calculated characterizing the number of errors of omission (underestimation). It is the number of the correctly identified pixels of a class divided by the total number of pixels of the class in the classified image. Omission error and user accuracy values are also connected to producer accuracy and commission error [18].
Further, the study considered other metrics derived from the error matrix to describe further the accuracy assessment, including commission and omission error, user and producer accuracy, and Kappa statistics. The user's accuracy reflects the reliability of the classification to the user. User accuracy is the more relevant measure of the classification's utility in the field. The measure of the producer's accuracy, which is equivalent to 'sensitivity,' reflects the accuracy of the prediction of the category. of omission, as well as 16.60% with 1 pixel, belonging to this category not being identified in this class. The Kappa coefficient of 0.9455 has been obtained for land use and land cover classification in the area for 2020, which is rated as almost perfect. Since the Kappa coefficient statistics for almost all the land use and land cover layers produced for the study area for the targeted year, the results have been closer to one (01), and a perfect agreement has been demonstrated. The higher the kappa coefficient, the more accurate the classification is.  Likewise, NDBI and NDVI classification accuracy were assessed (Tables 5 and 6). The overall accuracy of NDBI and NDVI is 88 and 98, respectively. The Kappa statistics of both classifications indicate almost perfect agreement. The Kappa values reached 01 for NDVI, while the NDBI value is less than NDVI but lies within the almost perfect agreement value range. All the accuracy tests show that the unsupervised classification of land use and land cover and targeted two features, building density and vegetation cover accuracy, is almost in perfect agreement.

Conclusion
The accuracy assessment in terms of Kappa statistics and error matrices is mandatory for classification results to be sure that what extent the classification is accurate. An accuracy assessment further supports the classification results if supervised classification with ground truthing is done. However, employing unsupervised classification must undergo an accuracy assessment to ensure the classification's accuracy. Although remote Sensing is essential for the detection of dynamic phenomena like land use and land cover through image classification and has undergone various improvements in the discipline, classifying a Landsat image to have accurate land use and land cover information depends on landscape complexity, image processing technique and classification process make classifying challengeable. This paper expected to classify and map land use and land cover of the study area using remote sensing and GIS techniques and to carry out an accuracy assessment to understand to what extent the accuracy of the classification worked.
The unsupervised classification was performed, classifying the image into five classes. Built-up is the main land use type in the area, which is further observed by the classification of NDBI and NDVI. Individual accuracy assessment parameters can be used to evaluate the model's performance concerning a single category or class of interest in the study. The error matrix was used to assess accuracy in this study. Overall classification accuracies and kappa coefficients were ideal in this investigation. The kappa coefficient is almost perfect, indicating that the identified image is suitable for further investigation. Accurately classified land use and land cover data can be used to produce a map. Such maps could use to detect Spatio-temporal changes in land use and land cover, land degradation, soil, and agricultural land changes, land temperature changes, and various land-related studies.