Home > Courses > Archaeological GIS | Dean Snow

 


GIS In Archaeology

Lab Exercise 14

During the exercises in Vector and Raster data analysis we investigated the potential relationship between archaeological site locations and various environmental factors: distance to water, agricultural soil class, elevation, and slope.
For each of these variables we examined the data to see if there was any discernable relationship between the distribution of these factors and the locations of archaeological settlements. In each case it appeared that there was a range of values which seem to correlate with the presence or absence of archaeological sites. In this exercise we will create a composite analysis that tries to simultaneously take all four of these factors into account to build a more robust model.
To create the model the entire survey area will be scored with respect to each of these four factors. The individual factor scores will then be summed together to create a composite score which will then be used to classify the study area as to the likelihood of locating a site in any given area. This model will then be compared back to the original site data to see the degree of success it has.

Step 1: Create Study Area Map
For this step you can use the map document saved from the data analysis labs but it may be more organized to recreate the map document.
Open ArcCatalog and ArcMap and create a new map in ArcMap. From ArcCatalog bring the following layers into the map document:
Sites_Training
Sites_Test
Rivers
Survey_Limits
Agricultural_Potential
DEM_100M

Step 2: Create Factor Likelihood Scores
In previous exercises the locations of 100 of the original 200 sites were used to determine what levels correlated with the presence or absence of archaeological sites. The results of these analyses can be simplified into the following data

Agricultural Land Type Distance to River
Land Type Sites Distance Sites
1, 2, 3, 5 > Expected < 500 Meters > Expected
4, 6 = Expected 500-1000 Meters = Expected
7, 8, 9 < Expected > 1000 Meters < Expected

Elevation Slope
Elevation Sites Slope Sites
< 1000 m > Expected 0-5 degrees > Expected
1000-1300 m = Expected 5-10 degrees = Expected
> 1300 m < Expected > 10 degrees < Expected

In this model the land will be classified into 1 ha units (100 X 100 meter cells) and each unit will be scored with 1-3 points for each of the four factors separately. The simplest way to split a region into mutually exclusive regions of a set size is to model the data using grids. In this way each grid cell within the study are can be scored for each factor. The final composite score can be created by simply adding the separate grid factor scores together.

Step 3: Set Options for Creating Factor Grids
Looking at the data sets employed for this study one of the data sets is already modeled as a 1 ha grid. The simplest tactic to employ, therefore, is to use this grid data set as the basis for defining the spatial distribution of the other grids. In addition we only want to look at land parcels which lie within the boundaries of the study area so the survey limits layer will be used as a mask analysis layer.
Go to the Spatial Analyst toolbar and select Options from the popup menu. On the General tab of the window set Survey_Limits as the Analysis mask


Switching to the Extent tab select “Same as Layer ‘dem_100m’” as the Analysis extent

Finally on the Cell size tab select “Same as Layer ‘dem_100m’” as the Analysis cell size.

Click the OK button to accept these parameter modifications. The project is now set to analyze grid data using a 1 ha. cell size with the analysis limited to just those cells within the boundary of the survey area.

Step 4: Score Area for Agricultural Land Types
Presently Agricultural Land types are modeled in a vector based polygon coverage which identifies each land parcel as being LType 1 thru LType 8 (with LType 9 representing water). In ArcMap it is relatively simple to convert between vector and raster based data sets.
Click on the Spatial Analyst toolbar and select Convert -> Features to Raster from the popup menu.
On the features to raster conversion window select Ag_Potential as the Input features and LType as the Field. Verify that the Output cell size is set to 100 (this should be the default as it is what was specified in the options tab in the previous step). Finally define the Output raster to be LType_Grid.

The result of this process is a raster based grid format that identifies each grid cell as belonging to LType 1 thru 9.

For the purposes of this analyses we want to simplify the 9 ltype classes down to 3 classes. Based on the previous analysis land types 1, 2, 3 & 5 have more than the expected number of sites, 4 & 6 have about the number that we would expect, and 7, 8 & 9 have less than the expected number of sites. We will reclassify the data in the following way:
LType Score
1, 2, 3, 5 3
4, 6 2
7, 8, 9 1

From the Spatial Analyst popup menu select the reclassify tool. Select LType_Grid as the Input raster. In the reclassify table use the scores listed above to reclassify each of the LType values to be 1, 2 or 3. Set the Output raster to be called LType_Score.

After clicking the OK button a new layer will be added to the map that ranks the land types into the 3 groups specified above.

Step 5: Score Area for Distance to Rivers
Like the Ag Potential layer the Rivers layer is currently modeled in a vector layer. Unlike the ag potential layer, however, the rivers layer only draws the locations where rivers are located and does not classify the rest of the study area. The spatial analyst tool set has a tool which it makes it easy to classify an area by the distance that cells are from features in a data set.
From the Spatial Analyst popup menu select Distance -> Straight Line. Set Rivers to be the Distance to layer, verify that the Output cell size is set to 100, and call the output raster River_Dist

The study area is now overlaid with a grid that calculates the distance from each 1 ha cell to the nearest river. Again we will simplify these values in the following way.
Distance Score
< 500 3
500-1000 2
> 1000 1

Open the Reclassify tool from the Spatial Analyst toolbar. Select River_Dist as the Input raster, and set River_Score as the Output raster.

Click the Classify button to bring up the Classification window. Modify the classification Method to be Equal Interval, then modify the number of Classes to be 3, then set the break values to be 500, 1000, and 5000.

When you return to the Classify window be sure to set the New values so that 0-500 get a value of 3, 500-1000 a value of 2 and 1000-5000 a value of 1


Click the OK button on the classification window, and again click OK on the Reclassify window. The River Score layer is then added to the map document.

Step 6: Score Area for Elevation
Elevations will be reclassified in the following way.
Distance Score
< 1000 3
1000-1300 2
> 1300 1

Use the same procedure as for distance to rivers to reclassify the DEM_100M layer by setting 3 breakpoints at 1000, 1300, and 5500, set the Output raster to be Elev_Score. Again be sure to set 3 as the new value for 699-1000 and 1 as the new value for 1300-5000

The resulting map classifies elevations into 3 score groups

Step 7: Score Area for Slope
With slope values the data will be classified as follows.
Slope Score
< 5 3
5-10 2
> 10 1

Use the Spatial Analyst toolbar to create a slope grid layer from the DEM_100M elevation layer. Use DEM_100M as the input surface, Degrees as the output measurement, Z factor of 1, Output cell size of 100 and create an Output raster called Slope_Grid.

Once created use the reclassification tool to reclassify the Slope_Grid data with break points at 5, 10 and 35, set the output layer to be Slope_Score.

Use the same procedure as was outlined for distance to rivers to reclassify the DEM_100M layer by setting 3 breakpoints at 1000, 1300, and 5500, set the Output raster to be Elev_Score.

Step 8: Create Composite Score
All the individual factors have now been scored with a 3 for those areas most likely to have sites and 1 for area least likely to have a site. The composite score we will create from this will be done by simply adding the 4 individual factor scores together. The resulting grid created from this procedure will have score values that will range from a minimum of 4 (a score of 1 on each factor) to a maximum of 12 (a score of 3 on each factor).
Click on the Spatial Analyst toolbar and select Raster Calculator from the popup menu.

In the Raster calculator window first double click on Elev_Score to add it to the formula window

Next Click the button, LType_Score, , River_Score, , Slope_Score to produce the following grid calculation equation

Click the Evaluate button and the new composite score layer will be created. The new layer will be added to the table of contents with a name similar to Calculation.

This calculation layer is only a temporary layer created in the Spatial Analyst temporary working directory. To save this as a permanent layer with a more appropriate name right click on the entry in the table of contents then select Make Permanent from the popup menu, save the layer with the name Total_Score and rename it’s name in the table of contents to be Total Score as well.

Step 9: Relcassify Composite Score
The result of the above steps was to create a layer containing 9 individual score levels. Just as the scores for the individual factors were reduced from numerous levels to 3 to make them more interpretable the same should be done with the composite score.
The individual factor scores were all created by comparing the site distribution of the 100 training sites with the individual factors. Based on the observed patterns, the individual scores were created.
Click on the Spatial Analyst toolbar and select Zonal Statistics from the popup menu. In the Zonal Statistics window set Sites_Training as the zone dataset, SiteNo as the zone field, Total Score as the Value raster, uncheck the Ignore NoData and Chart Statistic checkboxes, check the Join output table checkbox and save the output table as Training_Site_Total_Scores.

Right Click on Sites_Training in the table of contents and open the attribute table. Click the Options button and select Add Field. We will Add a new field called Total_Score, of data type Long Integer.

Once the field has been created right click on the field name and select Calculate Values. Set the new field equal to [Training_Site_Total_Scores.MEAN]. The click the OK button

Right click on Sites_Training in the table of contents and select Joins and Relates -> Remove Joins -> Remove All Joins. This will remove all the extraneous fields from the table view.

Right click on Total_Score field in the attribute table and select Summarize. At the Summarize window save the data as Training_Site_Total_Score_Summary and click OK

Open the resulting summary table to view the number of sites by Total Score level

From the summary table it is clear that the majority of the sites have a composite total score of 9 or greater. Once again the question remains as to whether this is a reflection of sites being preferentially distributed or whether they are simply found in the same proportions as the underlying geographic distribution of the variables.
To answer this question right click on the Total_Score layer in the table of contents and open the attribute table.

Combining these training site table and the total score table and computing the percentage covered by each yields the following summary table
Score # Cells # Sites % Cells % Sites
4 466 0 0.01 0
5 1709 1 0.05 0.01
6 2403 2 0.07 0.02
7 3200 1 0.09 0.01
8 6716 7 0.18 0.07
9 7274 13 0.2 0.13
10 6719 18 0.18 0.18
11 4518 27 0.12 0.27
12 3453 31 0.09 0.31

An examination of the table indicates that in those areas with a score of 11 or 12 sites are found in substantially greater frequencies than we would have expected by change. Areas with scores of 8, 9 and 10 have site frequencies slightly lower than what change would have predicted, and areas with scores of 4 thru 7 have substantially fewer sites than would be expected by chance.

Using this information we can collapse the 9 individual scores to 3 major probability levels for site occurrence
Score Score
11, 12 3 = Hi
8-10 2 = Med
4-7 1 = Low

Use the reclassification function to reclassify the Total Score layer and create a new layer called Site_Prob with 3 levels corresponding to the ones listed above.

Looking at the resulting map of this operation it is apparent that the majority of the training sites fall in the hi or medium probability zones.

The fact that most of the training sites fall in the hi or medium level should not be surprising to us, however. The reason we should not be surprised is that the model was in fact built by first examining the relationship of these training sites to the underlying distribution of our 4 geographic variables. Since the model was built from these data we cannot logically use these same data to validate the model. One way to try to validate the model is by comparing the results to the distribution of sites that were not used in the formulation of the model – for us this would be the 100 sites in the Sites_test layer.

Step 10: Test the Model
Since the 100 sites in the Sites_Test layer were not used to create the model we developed we can use them to test the effectiveness of the model.
Using the same procedures listed above use the Zonal Statistics tool to link the Sites_Test layer to the Site_Prob data set.

Add a Long Integer field called Site_Prob the Sites_Test attribute table, calculate it’s value to be the same as the mean of the joined table and then summarize the Site_Prob field to produce the resulting table.

Looking at the Site_Prob attribute table we see the following distribution of the scores.

If we combine the two tables together and calculate percentage values we obtain this distribution
Site_Prob # Cells # Sites % Cells % Sites
3 = Hi 7971 50 0.22 0.5
2 = Med 20709 45 0.57 0.45
1 = Lo 7778 5 0.21 0.05

From this we can see that although the Hi probability cells cover only 22% of the study area they contain 50% of the test sites. In the case of the Med probability they cover 57% of the study area but only contain 45% of the test sites and for the low probability cells the results are 21% and 5% respectively.
It appears from this that our factors do a good job of predicting the distribution of sites in the hi probability zones and a good job of predicting an absence of sites in the low probability zones. The medium probability zone remains problematic as it actually contains slightly fewer sites than we would have expected by chance alone.
We can see from this relatively simple modeling exercise that even using just a few variables in a relatively simplified manner it can be possible to predict the general locations of many of the sites. It should also be noted, however, that the models inability to accurately predict even the general location of the other 50% of the sites should give us pause to consider other ways that the model could have been improved.
GIS modeling of archaeological data can be an effective strategy to examine underlying spatial patterns but in many cases simple models do not take into account the wide range of variability that exists in human decision making patterns. As with all models they provide guidance with strategies to proceed by but their results should almost always be taken with at least a degree of skepticism.

Questions to think about
Can you think of ways that this modeling strategy could have been improved to do a better job at identifying cells in the medium probability area? Are there other ways that the four individual factors could have been combined to produce a preferable composite score? Are there other factors that you think could have been incorporated that could have improved the analysis?

 

 


© 2003 MATRIX
Project Director: Anne Pyburn
Indiana University Bloomington