Vector Database Construction

We've covered basic file importing, which is one way to build GIS databases. Obviously, secondary data are not always available for your study area, and you must resort to building your own dataset. This is often an arduous and time consuming part of a project. It's labor intensive and prone to errors. You are probably aware of the two main methods of digitizing a map: 1) digitizing from a digitizing tablet, 2) scanning a map and on-screen digitizing over the scanned image. Both methods have their advantages/disadvantages, but overall, if you have access to a large format scanner (expensive), digitizing a scanned map is often the way to go.

Indeed, in some cases the entire digitizing process can be avoided if you have a map that has only those features you want digitized. In these cases, image processing software can be used to automatically extract features from a scanned map to produce vector line features (a process called vectorizing). However, in most cases maps have a variety of layers, only one of which you want in each data coverage. In this case, you can either 1) scan the map and on-screen digitize only the features you need, or 2) trace a copy onto mylar of only the features you need, scan this, then convert the raster scanned file into a vector file. Occasionally, you can buy what are called 'mylar separates' of individual layers from a multi-theme map.

The key behind all these methods is the registration of the source product (whether a hard copy map, a scanned map, or a mylar trace). With a scanned map, make sure you are able to see registration tic marks on the scanned product. These can either be 1) corners of the image that have coordinates, or 2) even better, a graticule that covers the map giving coordinates for each interval intersection (even better because you'll have more choices, or can scan just a portion of a map and still be able to georeference/register it).


Georeferencing is the process of converting a image in file coordinates or page coordinates to a file in map coordinates in a specific map project, coordinate system, map projection and datum. For example, a scanned map can have an origin point and a raster association where each point on the map is identifiable by it's file coordinates (e.g. 1244, 1515). The task then is to convert these file coordinates to map coordinates.

There are a couple types of products that typically need georeferencing. Satellite images can be bought 'unprocessed', in which case the user is responsible for georeferencing to known coordinates. A satellite image is simply a raster dataset with file coordinates (rows/columns) that need to be converted to map coordinates. Likewise, aerial photography is commonly used in GIS operations and sometimes comes as already geo-referenced data (i.e. Digital Orthophoto Quadrangle (DOQ)), but often are delivered as hardcopy that needs to be scanned, registered, rectified and then interpreted.

The other type of product that is often georeferenced using GIS software is a hardcopy map which will be used as a backdrop for georeferencing. An example is a topographic map. This is the case we'll go through here.

In order to georeference an image, you need to identify map coordinates of any 3 points on the map [note: this is assuming an affine transformation is being used, see ArcGIS online help for more help]. In general the more spread out these points are around the features you need to digitize the better your registration will be (i.e. the less error you'll have in your registration). There are two main methods of registration (linking file coordinates to map coordinates). One method is to manually type in map coordinates for points, then select those points on your image. Another method is to bring up a coverage that is registered and that has features in it that are visible in the scanned file to register.

Basically, you need to use any information you can to identify the map coordinates of features you can see in your scanned product. If the scanned product has coordinates on it, great. If not, you'll have to find some georeferenced datasource that has features in it that are visible in your scanned product. An affine transformation warps the file, and applies map coordinates to the warped image.

A first order transformation warps images uniformly across the entire image. Successively higher order transformations apply different degrees of warping throughout the image. An affine transformation is a polynomial equation that converts the file coordinates to map coordinates. To visualize a first order affine transformation, consider that straight lines in the un-georeferenced file/image will still be straight in the georeferenced output image. However, a set of arcs that form a rectangle in the un-georeferenced image may be a parallelogram in the transformed image. In other words, the identical transformation is applied to each arc, and each part of each arc.

After creating registration points within your image, you can then rectify the image to produce a new image with map coordinates tied to a specific spatial reference system.

The rectification process can use a variety of interpolation methods to represent the old file pixels in the new transformed map coordinate space. With nominal data (i.e. non-continuous data), always use the nearest neighbor interpolation method (this is the default). The nearest neighbor method assigns the value of the new image pixel to the value of the pixel of the transformed image with the closest center point. In other words, there's no interpolation/averaging of multiple values, and pixel values don't change. With continuous data (i.e. elevation data, population density data), bilinear interpolation or cubic convolution methods can be used if you with your data to be 'smoothed'. Note: always use nearest neighbor with raw, unprocessed satellite imagery because you always want to preserve the original digital numbers.

So georeferencing is a multi-step process:

1) scan hardcopy map

2) identify and enter registration points ("registering" the map)

3) rectifying the registered map to produce a new dataset referenced to a specific projection, coordinate system and datum ("rectification")

Demonstration - Creating a new Personal Geodatabase and Feature Class


Now that you have a georeferenced scanned image, the task now is to digitize features from that georeferenced image. The most common features to digitize are: 1) points, 2) lines, and 3) polygons. The Getting to Know Desktop ArcGIS book goes through the process of creating and editing vector features in shapefiles and geodatabases. Supplementary information can be found in the ArcGIS Desktop Help and will be presented in class. Key concepts include the following:

1) Point features are simply created by identify the x,y coordinate.

2) Lines can be snapped together by setting the snapping environment so that two lines can share the same node feature

3) Polygon can be created using an Auto-complete function that automatically creates closed polygonal areas and facilitates the creation of polygon features that share a common line with an adjacent polygon.

4) The digitizing process is just one element in a series that contribute to the overall error present in a final database. It is important to acknowledge that a final digitized product is only as accurate as the map which is being digitized (and is probably somewhat less accurate because of the error involved in the digitizing process).


ArcGIS contains many sophisticated tools for creating topological relationships between features within and between datasets. For example,

ArcGIS also uses a rules-based methodology for maintaining the topological integrity of datasets. For example, given a dataset of parcels and a dataset of landuse zoning, a GIS analyst may wish to create a rule where zoning polygon boundaries must be coincident with parcel boundaries (so that no single parcel has two zoning types within it). Topological rules can be created within a feature dataset to identify topological errors (in this case a zoning polygon boundary that is not coincident with a parcel boundary) so that it can be edited and fixed. The ArcGIS Desktop help has an item accessible from the Index listed under "Topology, rules" that describes the point, line and polygon based topological rules.

Feature attributes/ID's

After digitizing features, the next step is to assign ID values to each feature you've digitized. This is only necessary if you need to uniquely identify each feature, such as if you need to link it to an existing ASCII file you have to import from Excel for example. If you have something like a streams coverage, where you don't really care what the name of the stream is but just where the streams are, then you don't need to worry about this step.

However, in many cases, you will need to assign an ID to each feature. For example, imagine a polygon coverage of parcels in Monroe County. Each parcel has a tax-id that identifies that parcel and is a link to a database that the county has that has certain attributes for that parcel. Example attributes would be the property value of the parcel, the mailing address of the owner, the county recognized acreage... To create a GIS database with both the spatial and attribute information, you'd typically: 1) digitize the arcs composing each polygon, assign a label point to each polygon, 2) assign an ID (tax-id) to each label point, 3) import the attribute information to an INFO table 4) join that attribute table to the polygon attribute table of the parcel coverage.

Global Positioning Systems - GPS

Lastly, I'll briefly mention GPS data now. We will spend a week on this towards the end of the semester including practice with GPS units outside and a lab exercise. Basically, the use of a GPS allows for the collection of point data in a specified spatial reference system. While points are collected these can be aggregated into lines or polygons. As with all types of database construction, GPS is prone to different sources of error. We'll discuss these errors and issues of GPS data collection later in the semester...