Home > Courses > Archaeological GIS | Dean Snow

 


GIS In Archaeology

Lab Exercise 11

Step 1: Create Study Area Map
Open ArcCatalog and ArcMap and create a new map in ArcMap
Using drag and drop from ArcCatalog or the Add Data function from ArcMap add the following layers from the Morelos geodatabase to the map document:
Sites
Survey_Limits
Rivers
Ag_Potential
In addition add the following raster based elevation layer from the Mexico folder:
DEM_100M

Step 2: Build Site Prediction Layers
Right click on Ag_Potential in the table of contents and open the attribute table. The table presently stores information on the Land Type (LType), Length of the perimeter of the polygon (Shape_Length), and the Area of the polygon (Shape_Area). Currently the areas of the polygons is calculated in square meters because meters is the native mapping unit for a feature class using the UTM projection. While the area measurements are accurate the numbers are so large that they become relatively hard to understand.
For this exercise we will instead base our area measurements on the number of hectares a polygon measures. A hectare is a parcel of land measure 100 meters X 100 meters. Therefore 1 hectare is equal to 10,000 square meters (100 X 100 = 10,000).
In the attribute table click on the Options button and select Add Field from the popup menu. At the Add Field window specify the name of the new field to be Area_Ha, and set it to be data Type Float.

Once the field is added to the attribute table right click on the field heading and select Calculate Values from the popup menu. Use the formula Area_Ha = Shape_Area / 10000 to calculate the area of the polygons in hectares.

As we did in an earlier exercise we will convert the Land Type classes from their ranked classification system to a numeric classification system. Agricultural Productivity for the various land classes will be:
LType Prod_Index
1 100
2 90
3 70
4 60
5 40
6 40
7 20
8 20
9 0

Add a new field to the attribute table. Name the new field Prod_Index, and define it to be of data type Long Integer.
Pick the Select By Attribute tool from the Selection menu. In the Select By Attributes window use the criteria LTYPE = 1 then click the apply button – looking at the Ag_Potential attribute table you should see that 37 of the polygons were selected. In the attribute table right click on the field heading for Prod_Index and set Prod_Index = 100.


If you scroll down the attribute table you will see that the Prod_Index was only set to 100 for those 37 records that were selected. Repeat the above procedure to select LType = 2 and set it’s Prod_Index to be 90, repeat this same process for the rest of the land types and assign them the Producitivity Indices specified above.

Step 3: Create Predictive Model Control Group
To inductively build a GIS model of archaeological site locations it is necessary to have a control group of sites with which to build the model. If we used our site data to build the model we cannot use those same data to test the model. In this case we will build our model by splitting the archaeological sites into two groups of 100 sites each. The first group of 100 sites will be used to build the predictive model, once the model is built we will use the second group of 100 sites to test how well the model predicts the location of archaeological sites.

Add the Geostatistical Analyst toolbar to your map window. From the main ArcMap menu select View -> Toolbars. From the popup window make sure that there is a checkmark next to the Geostatistical Analyst entry , if there is no checkmark next to the Geostatistical Analyst entry click it and the toolbar will be added to the map window.

Click on the arrow next to Geostatistical Analyst on the toolbar and select Create Subsets… from the popup menu.

Select Sites as the Input Layer: at the Create Subsets window. Hit the Next > button to continue.

The following window allows the user to separate the data layer into two groups. One group for is training (the control group) and the other group is for testing the model developed from the control group. Select 50% as the Percent for each group and accept the default Subset Names Sites_training and Sites_test.

Note that sites are randomly assigned to the two data subsets, for this reason no two random assignments will be the same. For the rest of this exercise the data reported in this exercise will differ from your results, but they probably will be somewhat close.

Step 4: Overlay Control Group Sites on Land Type
Right click on Sites_training in the table of contents frame and from the popup menu select Joins and Relates -> Join…

In the Join Data window select “Join data from another layer based on spatial information”. Select Ag_Potential as the join layer, and select that each point will be given the attributes of the polygon that “it falls inside.” Specify Sites_training_Ag_Potential as the output feature class for this new layer.

Step 5: Calculate the number of sites on each land type
a. Summarize LType Field
Open the Sites_Training_Ag_Potential attribute table. Right click on the heading for the LType field and choose Summarize from the popup menu.

Specify the output table to be a new table in the Morelos geodatabase named Training_Sites_Ag_Potential_Summary. Click on Yes when you are asked if you want to add the result table in the map.

In the summary table the field Count_LType lists the number of sites that occur on each of the different land types. The name Count_LType is hard to remember so we will create store these data in a field with a more easily identifiable name.

Open the Training_Sites_Ag_Potential_Summary table. Click the Options button to add a new field.

Create the new field to be named Observed_Site_Count with data Type Long Integer

Once the field is added to the data table right click on the field name and select Calculate Values.

Calculate the values for this field to be the same as the values in the Cnt_LType field.

Step 6: Calculate the land type coverage in the study area
The results of the previous steps tell us how many sites are found on each of the different land classes. While certain land types have more sites than other questions we cannot yet determine if people were preferentially selecting certain land classes over others. In order to answer this question we need to compare the distribution of sites to the distribution of land types in the study area.

The current Agricultural Potential layer maps the land classes over a region substantially larger than the archaeological study area. To compare the site distributions to the land classes we must limit our summary of land classes to only those that fall within the survey area.

Use the Geoprocessing tools Clip tool to create a subset of the Agricultural Potential feature class.

Specify the Agricultural Potential as the Input layer and Survey_Limits as the clip layer. The Output layer from this operation will be Survey_Area_Ag_Potential.

The result of this operation is a layer that “cookie cutters” out just the agricultural potential polygons that fall within the study area. Right click on Survey_Area_Ag_Potential in the table of contents and bring up the properties. Select the Symbology tab and click the Import button to import the layer symbology from the Agricultural_Potential layer. The clipped layer now blends seamlessly in with the original layer. Turn the original Agricultural Potential layer off to verify that the new layer contains only the land inside the study area.

Open the attribute table for the Survey Area Ag Potential Layer. Because we stored this as a feature class in the geodatabase the Shape_Area field values have been automatically updated to reflect the changes. The values in the Area_Ha field, however, have not been update so we need to recalculate the values again. Right click on the Area_Ha field and select Calculate Values. Update the field values to be Shape_Area / 10000

Step 7: Summarize the study area land classes
To summarize the quantity of each of the land classes in the study area return to the attribute table for the Survey Area Ag Potential layer, right click on the LType field and select Summarize from the popup menu.

On the Summarize window check the Sum entry under Area_Ha, and change the output table to be Survey_Area_Ag_Potential_Summary.

Open the Survey_Area_Ag_Potential_Summary. The Count_LType field contains the number of individual polygons that each land class had in the study area, while the Sum_Area_Ha contains the total number of Hectares that these polygons covered.

Right click on Sum_Area_Ha and select Statistics. Notice that the Sum of the field is approximately 36,462 hectares - this is the total area of the survey area.

If sites are randomly distributed within the study area then we would expect to find the same percentage of sites on a given land class to be the same as the percentage of the study area covered by that land class. Since there are 100 sites in the Sites Training layer we would expect that the number of sites for each land class would be: 100 * Area_Ha / 36462

Return to the Survey_Area_Ag_Potential_Summary table. Add a New Field, call it Expected_Site_Count, data Type Float, Precision 1, Scale 5 (setting a precision of 1 will result in the numbers being rounded off to the first decimal place).

Right click on the new field, select Calculate Values and enter the formula 100 * Area_Ha / 36462.

Step 8: Compare observed to expected site frequencies by land class
Open the tables Survey Ag Potential Summary and Sites Training Ag Potential and arrange the windows so that you can see both on screen at the same time. Look at the numbers of observed versus expected site counts for the various land classes. In some cases, such as Land class 1, the observed site counts exceed the expected site counts while in other cases, such as land class 8, the number of expected sites is greater than the number of observed sites.

To more adequately summarize these data we will combine the two values to calculate the observed versus expected site ratios.

Right click on the Survey Area Ag Potential and select Joins and Relates -> Join from the popup menu. Modify the selections in the Join Data window to “Join attributes from a table”, to the table Survey Area Ag Potential Summary based on the LType field

Repeat the above step and this time join the data to the Sites Training Ag Potential Summary based on LType.

The above procedure joins both the expected and observed fields, among others to the data table for the Survey Area Ag Potential layer.

Open the Survey Area Ag Potential layer’s attribute table. Add a new field called Expected_Observed Ratio. This should be a data Type Float field with Scale 5 and Precision of 1.

Once the field is added to the attribute table right click on the field and select Calculate Values from the popup menu

Calculated the Expected to Observed ratio by dividing the Observed site count by the expected site count.

The result of this operation will be values greater than 1 where the observed site count is higher than we would have expected by random chance, and less than one where the observed number of sites is less than what random chance would have predicted.

To simplify the Survey Area Ag Potential attribute table we will remove the preceding join operations. Right click on the layer in the table of contents and select Joins and Relates -> Remove Joins -> Remove all Joins from the popup menu.

Return to the attribute table, right click on the field LType, select Summarize from the popup menu, click the checkbox for Minimum under the Expected_Observed_Ratio, and rename the output table to be LType_Observed_Expected_Ratios in the Morelos geodatabase.

Once the operation is complete open the LType Observed Expected Ratios table. An examination of the results indicates that based on our training sites it appears that Land Classes 1, 2, 3, and 5 have more sites than would be predicted by chance, Land classes 7 and 8 have less that would be predicted by chance and Land classes 4 and 6 have approximately the number that we would expect by chance

Step 9: Map observed versus expected site frequencies by land class
Inspecting the data from the data tables gives us a sense of which land classes seem to have more, or less, sites that expected. Another way to view the data is by mapping the distribution of these values over space.

First we will create a binomial output map where the land classes will be classified into two groups: those with an Observed Expected Ratio > 1 and those with an Observed Expected Ratio < 1.

Return to the map document and reorder the layers by moving Survey Area Ag Potential beneath the sites, rivers and survey limits layers. Right click on the Survey Area Ag Potential entry and select Properties from the popup menu. Go to the symbology tab and choose Graduated Colors, using Expected_Observed_Ratio as the value field, and change the number of classes to 2 with the break point at 1.0


Apply the new symbology and return to the map document. Turn off all the site layers except for the Sites_Training_Ag_Potential layer. The map should look similar to the one below.

Notice that many of the sites do fall into the Observed Expected Ratio > 1 category but a significant number of sites are found in areas with a ratio < 1.

To determine exactly how many sites fall on land classes with a ratio > 1 go to the Selection menu and open the Select By Attributes too. Select Survey Area Ag Potential as the layer to query and set the query constraint to be Expected_Observed_Ratio > 1

Next open the Select By Location tool from the Selection menu and select features from the Sites Training Ag Potential layer that intersect the selected features of Survey Area Ag Potential.


Close the selection tool windows and open the Sites Training Ag Potential to see that 68 of the sites (or 68%) fall on land classes with expected observed ratios > 1.

Open the Survey Area Ag Potential attribute table and verify that 221 of the 314 polygons are currently selected. Right click on the Area_Ha field and select Statistics from the popup menu. In the Statistic results window it indicates that the sum of the selected records is approximately 19,278 hectares.

Since the survey area as a whole contains 36,462 hectares this means that approximately 53% (19278/36462 = 52.9%) of the study area is land classes with an expected to observed site ratio greater than 1.

What we have created is a very simple predictive model using land class to predict the likelihood of site presence or absence. As a general rule if the land class that is being surveyed is one of the land classes with a ratio greater than 1 (Land classes 1 thru 5) we think it is more likely to have a site than if the survey area fell on land classes 6 thru 8.

Looking at the data more critically we see that 53% of the study area is made up of land classes 1-5 and this area contains 68% of the sites used to build the model. In general terms this does not appear to be an overly sensitive model. While land class appears to have some explanatory power as a general rule it isn’t a very good predictor of site locations.

Step 10: Build and test a 3-level classification model
Using procedures similar to those outlined above try to build a new model that separates the 8 land classes into 3, rather than 2, major groups. The 3 groups should be made up of lands with an expected to observed site ratio > 1.1, those with a ratio of 0.9 to 1.1, and those with a ratio less than 0.9.

Evaluating the Impact of Distance to Water on settlement locations

Step 11: Create map layers
Return to the ArcMap document window and verify that the map contains the Sites_training, Sites_test, Survey_limits, Agricultural Potential and Rivers layers.

A simple examination of the site distribution map with respect to rivers seems to indicate that most sites tend to be located relatively close to a major river. In this section we will analyze the level of impact that this factor had on settlement decisions in the region.

For this exercise sites will be measured for their relative nearness to the river by classifying each site as to whether it is located within 500 meters, 1 km, or more than 1 km. from the nearest large river. This information will then be compared to the overall placement of rivers within the study area.

Step 12: Create buffers around river layer
To identify all sites that fall within 500 meters, 1 km. and more than 1 km by creating 500 meter and 1 km buffer zones around the river coverage and then overlaying the site map on these buffers to determine which sites fall within each zone. Any site that does not fall within the 500 meter, nor 1 km. buffer will be classified as being located more than 1 km. from the river.

Use ArcMap’s buffer wizard to create a 500 meter buffer. Select the Buffer Wizard tool and specify the Rivers layer as the one to create buffers around, then hit the next button.

Specify 500 as the distance and Meters as the distance units for the buffers, then click the next button

Select the Yes radio button to indicate that you want to dissolve barriers between overlapping buffers, and store the output into a feature class called River_Buffers_500M in the Morelos Geodatabase.

Once the buffer layer is created add it to the map document and move it’s position in the table of contents so that it is below the Rivers and Survey Limits layers.

Repeat the above procedure to create a 1000 meter (1 Kilometer) buffer around the rivers. Store this as feature class River_Buffers_1Km in the Morelos geodatabase.

Step 13: Identify sites and their distance to the nearest river
As in the previous portion of the exercise we will be basing our analysis on the 100 randomly chose training (or control) sites. Open the Sites Training Ag Potential attribute table and add a new Long Integer field called River_Dist_Class.

We will code each site with a River_Dist_Class of 1 if a site is within 500 meters of a river, 2 if it is within 1 km. and 3 if it is farther than 1 km. from the nearest river. We will do this by first labeling all sites with river class 3 and then we will update the values of those within 1 km. and 500 meters to have their appropriate river class values.

Right click on the River_Dist_Class field heading and choose Calculate Values from the popup menu. In the Field calculator window specify 3 as the value for the field, then click OK.

Notice that now all sites contain the value of 3 for their river distance class. Go to the Selection menu and choose the Select By Location tool. Select features from the Sites Training Ag Potential layer that intersect the features of River_Buffers_1Km and click the Apply button.

Return to the attribute table and notice that 83 of the 100 sites were selected by this operation. Again right click on River_Dist_Class and select Calculate Values from the popup menu. This time specify a value of 2 for the field. Notice that the 83 selected sites all have their value updated to be 2 while the unselected sites retain 3 as their field value.

Repeat the above procedure to select all the sites that intersect the 500 meter river buffer and the calculate their new river distance class to be 1.

Right click on the Sites Training Ag Potential layer in the table of contents and click on Select -> Clear Selected Features from the popup menu to unselect all the selected sites in the layer.

Step 14: Summarize site distance to water data
Return to the Sites Training Ag Potential attribute table. Right click on the field River_Dist_Class and select Summarize from the popup menu. Change the output table to be Training_Sites_River_Dist_Summary in the Morelos geodatabase.

Examining the summary data table created above we can see that there are 61 sites within 500 meters of the nearest river, 22 that are 500 meters – 1 km. and 17 sites that are more than 1 km. from the nearest river.

Step 15: Comparing site river distances to the study area
By inspecting the summary table and the map it’s pretty clear that there is a preference for sites to be located closer to rivers. The above observations, however, don’t give us an objective numerical way to compare the information to the general background distribution of rivers in the study area.
Similarly to the procedure carried out for the Agricultural land analysis we will compare the percentage of sites and their distances to water to the percentage of land in the survey area that falls within those distances to water. If sites are not being differentially located with respect to rivers then the percentage of sites in each zone should be the same as the percentage of the study area covered by that zone. If the percentages are not the same then it is likely that the inhabitants were drawn to live nearer to the major rivers.
Open the River_Buffers_500M attribute table and add a field called Area_Ha to the table. This field should be a float data type.


Once the field is added to the data table right click on it and select Calculate Values from the popup menu. Calculate the value to be Shape_Area / 10000 to store the area values as hectares rather than square meters.

Looking at the data table after this operation we see that the river buffer layer contains about 22,153 (5,125 + 17,038) hectares of land within 500 meters of a river.


Looking at the map, however, it is clear that much of this buffer area falls outside the study area limits and so those lands should be excluded from our analysis.

Follow the similar procedure used with the Agricultural Potential layer to clip it’s extent to just be the land that falls inside the study area. Store the result of this procedure as a new feature class called Survey_Area_500M_River_Buffers in the morelos geodatabase.

While the shape_area field is automatically updated with the area in square meters when the clip operation was performed. The Area_Ha field, however, still contains the old area calculation. Right click on Area_Ha and recalculate the value to be Shape_Area / 10000. To obtain the new total area, right click on the Area_Ha field and choose Statistics from the popup menu. Notice that there are only 9,446 hectares in the study area that are within 500 meters of a river.

If we divide the 9,446 hectares of land within 500 meters of a river by the total area of the survey area (36,462 hectares) we see that these buffers make up approximately 26% of the study area (9,446 / 36,462 = 25.9%). Looking at the site summary table, however, we see that 61% of all sites lie within this buffer.

Follow the steps outlined above to add a field Area_Ha to the River Buffers 1 Km layer. Then clip the River Buffers 1 Km to the extent of the Survey Limits layer and save the result as feature class Survey_Area_1Km_River_Buffers in the Morelos geodatabase. Finally, recalculate the values in the Area_Ha field of the clipped layer. An examination of the attribute table indicates that approximately 17,280 hectares of the study area lies within 1 Km. of a major river.

Looking back at the summary table of sites and their distance to water, there were 22 sites (or 22%) that were located between 500 meters and 1 Km. from the nearest river. Notice that the 17,280 hectares reported for the 1 Km. buffer, contains both the area that is 500-1000 meters from the river but it also contains the land that is less than 500 meters from a river. To compare the 1Km. river buffer to the sites data we need to subtract out the area that is less than 500 meters from the nearest river. From this operation we find that 17,280 – 9,446 = 7,834 hectares is located between 500 meters and 1 km. from the rivers. From this we can find that 22% of the sites lie in this buffer and this buffer covers 21% (7,834 / 36,462 = 21.4%) of the study area. It appears that sites are no more likely to be located in this zone than we would have predicted by random chance.

Finally since 17,820 hectares are located within 1 Km. of a major river there are 18,642 (36,462 – 17,820) hectares in the study area that are more than 1 Km. from a major river. Comparing the site distribution to this measure we find that 17 (or 17%) of the sites are located in this zone but this zone makes up 51% of the study area.

The preceding analysis seems to indicate that sites are more likely to be located within 500 meters of a river, less likely to be located more than 1 Km. from a major river and their distribution is random between 500 and 1,000 meters from a major river.

Step 16: Save Map and Exit
Save the map you created in this lab. In the next lab exercise we will use these same data to investigate the possible relationship between site locations and the raster data sets of elevation and slope.


 


© 2003 MATRIX
Project Director: Anne Pyburn
Indiana University Bloomington