Home > Courses > Archaeological GIS | Dean Snow

 


Lecture 05 : Vector Data Creation (link to Powerpoint file)

Lab Exercise 05: Vector Data Creation

GIS in Archaeology: Vector Data Creation

Data Aquisition

Public data sources -- a great deal of GIS data is available through 3rd party sources.

Data creation methods: remotly sensed data, GPS data, adress data, paper map digitization.

Remotely Sensed Data

Digital Ortho Quads (DOOs) are geographically rectified aerial photos that can be used to identify geographic features for digitization or can be used as a background on which to draw features. Satalite imagery is single or multi-band geographically referenced imagery and can be used to identify features such as water, land cover and basic soil characteristics.

Global Positioning System (GPS) Data

These type of data provide geographic coordinates to users in the field. Data can be downloaded from the system to create geo-referenced data. It is important to be concerened with the accuracy of individual readings. Basic handheld systems can provide accuracy within 20 meters. With additional processing it is possible to obtain accuracy with in 3-5 meters. With a coordinated base station you can get sub-meter accuracy.

Coordinate Input Files

Files containing x,y coordinates of points can be imported into GIS and converted into geographic locations. You need to know the projection information associated with the point file.

Street Addressing

Geocoded street coverages can be obtained from a variety of governmental and private sources. Geocoded street coverages contain information on zip codes, street names and address ranges. Software uses interpolation to map street addresses onto geocoded street coverages. Accuracy of the data varies depending on quality of original coverage and uniformity of address spacing along streets.

Paper Map Conversion

-On-screen digitization (aka heads up digitizing)

-Digitizing tablet

-Scanned maps

-Accuraccy of these techniques is highly dependent on the accuracy of the original source map.

On-Screen Digitization (aka heads up digitizing)

The user manually relates features on a paper map to features on a georeferenced data set (often a Digital Ortho Quad, Digital Quad Sheet). Accuracy is based on spatial accuracy of the original digital data set and the users ability to accurately identify appropriate features on paper  and digital map sources.

Digitizing Tablet

Digitizing table contains a series of closely spaced censors taht accurately track teh position of the table cursor (aka puck). Using a digitizing tablet the user assigns geographic control points of the paper map. Based on these control points all other points on the tablet are known. Features are manually drawn using the cursor.

Scanned Maps

The original paper map is scanned. Geographic coordinates are assigned to points on the scanned image and then the rest of the image is georeferenced based on these control points. Georeferenced scanned images are then converted into a vector file by automated and/or manual means.

Projective Transformation -- Rubbersheeting

Rubbersheeting is a process to correct map distortions by adjusting a source layer to a geographically accurate target layer. The user specifies a series of control points taht are identifieable on both layers. The features on the sources layer are stretched to overlay them on the target layer. Rubbersheeting can differentially stretch different parts of the soures layer (i.e. it does not preserve line parallelism).

Rubbersheet Point Selection

Accuracy of rubbersheeting will always be best near control points. Whereever possible  you want to select control points that bound the source layer and are distributed within critical areas of the source layer.

Importance of Data Quality

Data quality can be defined as "fitness of use" of data for an intended purpose. One advantage of GIS is the integration of diverse data sets. However, data may be used in ways not forseen by their producers and by users without the knowledge or experience to judge whether an application is appropriate. Potential pitfalls of GIS programs is their ability to hide underlying data quality issues. Understanding data quality is important because without it we do not know what confindence to put in the results of our analysis (i.e. garbage in = garbage out).

Data Quality Factors

1. Scale -- the underlying scale at which the data conform to acceptable accuracy standards.

2. Precision -- a measure of the exactness of measurement.

3. Accuracy -- the degree to which the data matches the true values.

4. Currency -- the degree to which the data is current for the application at hand.

Data Quality and Error

The GIS connundrum: looks good does not mean it is good.  GIS analyses usually incorporate data collected from diverse soures. Often the data can superficially look good but still contain errors taht limit its utility for GIS analysis. Errors can be classified into three types: spatial, attribute and procedural/analytic. Errors can generally occur at three phases of GIS analysis: data collection, data input and editing, and methodological.

Data Collection Errors

Satalite sensors and aerial cameras can introduce error. Surveying equipment and GPS instruments have associated errors. Field recorders or instruments may not always be able to accurately capture the data. Original map documents have inherent inaccuracies. Features change over time (modified, destroyed, added).

Data Entry Errors

-Digitizing errors -- systematic errors are often related to inaccurate geo-registration. Random errors can be introduced by missed or inaccurately drawn features.

-Attribute data entry errors -- humans often make errors in transcribing attributes into GIS.

-Equipment errors -- occasionally scanners, digitizing tables, etc. can go off calibration.

-Errors can be introduced by the underlying data source. Accruately reproducing an inaccurate paper map simply propagates the error. All digitization is limited by the resolution of the underlying data source.

Dealing with Error

Error is closely related to accuracy (i.e. higher accuracy implies fewer errors). There are three classes of errors:

1. Gross erros -- refers to "mistakes." They can be detected and avoided via well-designed and careful data collection.

2. Systematic errors -- occur due to factors such as human bias, poorly calibrated instruments, or environmental conditions.

3. Random errors -- they cannot be avoided and can be treated with mathematical/statistical models.

Digitizing Errors

Topological errors -- features in digitized data contain artifacts that violate the topological rules of the feature type:

-undershoots

-overshoots

-dangling node -- acceptable in certian circumstances (i.e. streams, roads)

How Good is Good Enough?

Should we always look for the highest possible data quality level? No. Demanding a higher level of data quality than acutally needed quickly becomes a significant and unnecessary expense. The "level of data quality" should be balanced against the "cost of the consequences of less accurate data." One reasonable choice of data quality leve is to go for "the minimum level of data quality that will meet your needs."

Data Quality Conclusions

GIS products are models of reality. There is alwyas a degree of error. Error in a coverage must always be addressed. Inherent error may be incorporated into how objects are actually represented in a coverage. Data in a coverage or map is only as good as the least accurate data sources. If all GIS products are flawed how do we deal with this? Document it through Metadata.

What is Metadata?

Documentation of the content, quality and condition of the data:

-Who made it? Who distributes it?

-What is the subject, the processing?

-When and where was it collected?

-Why and how was it collected?

-How much does it cost?

-How is it referenced to the real world?

-What is the quality of the data?

-Who should I contact if I have questions?

Metadata Sections:

-Identification

-Data Quality

-Spatial Organization

-Spatial Reference

-Entity and Attribute

-Distribution

-Metadata Reference

Why Create Metadata?

It helps users understand the data by providing consistent terminology, focusing on key elements, determining fitness for use, facilitating data transfer and interpretation by new users, making it easier to resuse and update data, reduces workload and questions about the data. Metadata enables discovery by providing information to clearinghouses, providing flexibility in searching,  and it is key in sharing spatial data and historical documentation. Metadata also limit liability by preventing inappropriate use and providing protection if data are inappropriately used.


© 2003 MATRIX
Project Director: Anne Pyburn
Indiana University Bloomington