This fall I begin Special Topics in GIS as I enter the last semester for my graduate certificate in GIS. This week's focus was on methods of calculating spatial data quality.
The first task was to calculate the horizontal and vertical data accuracy and precision based off of data acquired from a handheld GPS device. Accuracy is how close values are to an accepted reference value. While precision is how close values are to one another (for example, a cluster of GPS points from the same device measuring the same spot would all be precise).
We started with a collection of GPS points that were all recording the same location. Then, to find the average of the points I found the mean latitude and longitude values of all the collected GPS points and created a new average point of all of them in its own feature class.
Next, I made a multi-ring buffer around this average point. I calculated the distances for each buffer by finding what the values would be for where 50%, 68%, or 95% of the points are within the buffer. I found the index for each particular percentile by taking the waypoints feature class that was spatially joined with the average point to create a new distance field and then multiple the total number of waypoints by the desired percentile to find the corresponding index value. This index value is the distance at which that percentile of points would be inside of the buffer. This method of creating a multi-ring buffer is to show how precise the data collection process was.
68% of the points in this data collection fell within 4.4 meters of the location of the averaged waypoint.
Another important feature of data quality is to measure the accuracy of the data. We did this using data from an absolute reference point that was established outside of the data collection process. The majority of this work was done in Microsoft Excel using .dbf files. I used waypoint data and compared it against benchmark data to calculate the values that were used in the cumulative distribution function graph.
Consulting this graph shows the likelihood that a given value will be within that distance from the reference point. This particular GPS device only has about a 10% chance of being within a meter of the reference point for any particular reading.
The CDF shows how likely it is for a point measured using the GPS to be a certain distance from a reference point. Knowing the accuracy of a GPS device is important since some project may suffer from poor data accuracy.