Lab IV: Unsupervised Classification with ERDAS

Last Updated 1/23/08

Introduction: Previous labs have relied on density slicing to identify different cover types in satellite imagery. As you now realize, this process is rather subjective. This exercise will introduce you to a more objective, statistically based approach to identifying various cover types using a process called Unsupervised classification. Unsupervised classification is a method that examines a large number of unknown pixels and divides them into a number of classes based on natural groupings present in the image values. Unlike supervised classification, unsupervised classification does not require analyst-specified data. The basic premise is that pixels from the same cover type should be close together in the spectral measurement space (i.e. have similar digital numbers), whereas pixels from different cover types should be comparatively well separated in spectral space (i.e. have very different digital numbers). This process can simultaneously utilize data from multiple bands.

The classes that result from unsupervised classification are referred to as spectral classes. Because they are based on natural groupings of the image values, the identity of the spectral classes will not be initially known. You must compare classified data to some form of reference data (such as large-scale imagery, maps, or site visits) to determine the identity or information classes of the spectral classes. Each information class will probably include several spectral classes.  Our reference data will come from the site visits done in previous years.  

It is hard to imagine how the natural groupings of the image values are created in a space that has more than three dimensions. Fortunately, statistical techniques are available that can be used to automatically group an n-dimensional set of observations into their natural spectral classes. Such a procedure is termed cluster analysis by statisticians. In the remote sensing literature, cluster analysis is referred to as Unsupervised Classification. Prior to lab, it will be essential for you to read Chapter 12 in your textbook.  Chapters 6 and 7 in the text by David Verbyla will also be useful.  As I mentioned in class on several occasions, this text is on reserve in the main library and copies of these chapters are also available in the class notebook in the Huxley library.  In addition, you might also take a look at the ERDAS Imagine online documentation.  Take a look at Chapter 7: Classification in the Field Guide:

fieldguidebutt

 

Loose Ends: We will be using the ground truth data you collected a week or so ago using the Trimble GPS units.  You will need to export these data from Pathfinder Office into a form that we can use for this lab.  The procedures for doing this are as follows: 

Open Pathfinder Office and go to Utilities-Export. From the Export dialog box, use the browse button to select your “rover” file.  This is the file you created in the field.  It should have a .ssf suffix.  Specify an output folder for your export file.  Under “Choose and Export Setup”, select “Sample ArcView Shapefile Setup, then select Properties.  Go to the Coordinate system tab and specify UTM, Zone 10 North with a Datum of NAD 1927 (Western United States).  Under the Position Filter Tab and the “Include Positions That Are” section, be sure to check the box for “Uncorrected.”  We did not differentially correct our data and if the Uncorrected box is not checked, none of your data would be exported.  Finally, go to the Attributes Tab.  Click on Code Value 1 and be sure that the Position box is checked under Point Features.  I’d suggest that you make sure that none of the boxes are checked under General Attributes other than the Position box under Point Features.  If any other boxes are checked, this will just export a bunch of information that you don’t really need.  Click OK to close the Export Setup Properties dialog box, then click OK again to close the  Export dialog box.  Now close Pathfinder Office and go to the output folder you specified.  You should find three files called LULCII.shp, LULCII.shx and LULCII.dbf.  Double-click on the .dbf file and it will open in Excell.  If you did things correctly, it will contain only the LULC codes and the UTM coordinates for each point.

Reality Check:  Are all of your points realistic?  A good way to check this is to overlay your points on an air photo of our study area.  Open ArcMap and add an air photo of Bellingham from C:/gisdata/airphoto02.  Now add the shapefile you just created.  Take a look at each point and see how things line up.  When you are satisfied, email me all three of these files to Courtney.  We will compiles the data for the class and use these for our accuracy assessment later in the lab. 

 

PROCEDURES

Step 1: The imagery. For this lab we are going to use the Baker-Bay study area that we used earlier this term.  For our very first lab, I had you use data from July of 2005 for this Baker-Bay scene.  Data from this same area from July of 1995 and late September of 2000 is also available.  Note that this Baker-Bay scene is a subset of about 1500 lines by about 2500 columns taken from a full TM scene.  A full TM scene is about 9000 rows by about 9000 columns.  The 2005 Baker-Bay image is available at: J:\saldata\esci442\baker-bay\bakerbay2005.img.  This file contains six database channels that include TM Bands 1, 2, 3, 4, 5, 7.  TM Band 6 (the thermal band) is not included.  We will be using this 2005 image for our classification rather than the older scenes.  Just for fun, you may want to take a look at the older  images.  Among other things, you will note that the 2000 scene has much less snow than the 1995 or 2005 scenes.  This is mostly due to the fact that the 2000 scene is from late September and the 1995 and 2005 scenes are from July.   The snowpack late in the spring of 1995 was about the same as the late spring snowpack in 2000.  Another interesting difference between these scenes is the amount of shadow.  The sun is much lower in the sky in late September and this creates more shadow.  This could complicate a classification effort that involved using the 2000 scene, particularly on north-facing slopes.  Finally, you will notice some differences on these three images that reflect land-use changes between 1995 and 2005.  Later in the quarter, we will be doing a change detection lab to quantify these changes.

Step 2: Unsupervised Classification: We will be using the ISODATA unsupervised classification method that I discussed in class.  To start this routine, click on the Classifier button on the ERDAS Imagine control panel.  classbutton  Then go to Unsupervised Classification

unsupbutt

This brings up the following window:

unsupcl

 

Fill in this window by specifying an input raster image and make up a filename for your ouput image.  Specify Number of Classes: 50.  This will generate 50 spectral classes in your output image.  Be sure to UNCLICK the Output Signature Set box.  Under the Processing Options section, specify 15 iterations and a Convergence Threshold of 0.990.  Leave all other default settings in the window unchanged.  Then click ok to start the classification.  It will take a few minutes for this process to run.  Gazillions of calculations are being completed.  Be patient.  Be amazed.  Be glad we have pretty fast computers and recognize that it would have been impossible to do this in a university class just a few short years ago.

When it is finished

classcomplete

Click the ok button, then open this classified image in a Viewer.  It will not look like much at first, just a gray-scale image.  In this image, each gray value represents a single spectral class. 

Viewing your classified image and Linking Viewers:  After ISODATA has finished running, open your classified image in a Viewer.  This image will contain a single band with 50 classes.  In order to assign spectral classes to information classes, it will be helpful to open a second viewer with either an IR display (TM 4 in the red display gun, TM3 in the green gun and TM2 in the blue gun) or a “True Color” display (TM3 in the red, TM2 in the green and TM1 in the blue).  It will be useful to arrange these two viewers so that they can both be viewed simultaneously.  To do this from the Viewer window, go to View-Tile Viewers.  This will resize both viewers so that they do not overlap.  Another handy trick is to link both viewers.  This will ensure that you are always looking at the same part of the scene in both viewers.  To do this, go to View-Link/Unlink Viewers-Geographical

linkview

When you do this, another window will pop open

linkinst

These instructions were not terribly clear to me.  All this is asking is that you move your cursor into the window that you want to link to.  When you do this, the chain symbol will appear to confirm that you can make a link to this window.  Click one time in this window to make the link.  After the link is established, moving around in one window will also shift the view in the other window.  You can zoom out in one window to get an overview and keep the other window zoomed in to see details in the image.  Linking windows like this is a handy trick that you will find many uses for.

 

Step 3: Preliminary Assignment of Spectral Classes to Information Classes:  With two viewers open and linked, you can begin the process of assigning spectral classes to information classes.  To do this, open the Raster Attribute Editor in the Viewer that contains the classified image.  You will recall that we used the Raster Attribute editor in previous labs.  It is opened from the Viewer window by going to Raster-Attributes. 

Click on a point in the image.  This will highlight a row in the Raster Attribute Editor.  You can then move the sliders around in the Raster Attribute Table so that you can see the Color column.  Click on this cell in the table and select a color to use for this spectral class and then type an appropriate Information Class name into the Class Names column.

Preliminary Class Assignments; Water and Rock: Let’s start this process with a few simple classes simple.  First, zoom your classified image out so that you can view the full extent of the image.  Begin by clicking on Lake Whatcom in the classified scene.  This will highlight the row in the Raster Attribute Table for the spectral class that includes this point in L. Whatcom.  Now click on the Color cell in the Raster Attribute Table for this row and select an appropriate color; perhaps blue.

waterclass

At this point, all the pixels in the classified image that are members of this spectral class will be “painted” blue.  Take a look at the image.  What do you see?  Do you feel that all of the pixels that are blue really represent water?  What are the large blue patches up in the mountains?  Are these really water bodies?  Zoom in on these patches in your other viewer to figure out what these are.  After a bit of exploration, you will see that up in the mountains, in most cases, these patches represent deep shadow on the north facing sides of mountains.  This is a problem.  This is an example of a case where a single spectral class includes more than one information class.

Let’s look at another example of this.  Zoom in on Mt. Baker.  Now click on row 11 in your Raster Attribute Table and assign the color yellow to this spectral class.  This represents exposed rock and soil on the flanks of the mountain, just below the glaciers. 

classlab1.jpg

This seems reasonable.  Now move the slider either of the viewers so that you are looking at the portion of the scene that includes Bellingham.

urbanrock

As you can see, in the Bellingham are, this spectral class seems to include roads and other features in the urban and suburban area.  You might argue that this is another case where a single spectral class includes more than one information class.  It might be nice to be able to separate these urban and suburban rocks and soil from alpine rocks and soil.  How could we do this?  Any ideas?

One way might be to use some information about topography.  In the case of our urban/alpine problem, one topographic feature that would help to separate these two classes is elevation.  In the case of our water/shadow problem a topographic attribute that would separate these two classes is %slope.

No classification will ever be perfect.  Later in this lab, we will consider ways in which slope and elevation might be incorporated into your classification.  Let’s  proceed with the process of assigning all spectral classes to an information class.  NOTE THAT MOST INFORMATION CLASSES WILL INCLUDE SEVERAL SPECTRAL CLASSES!  Some information classes (such as water/shadow) will be contained in the same spectral class.

Step 4: Use of Groundtruth Data to Aid Assignment of Spectral Classes to Information Classes:  At this point, you could use your knowledge of the study area to assign spectral classes to information classes.  This approach is common, however, since we have a large set of ground truth data, we can use it to help us with this task.  The ground truth data are contained in J:\saldata\esci442\baker-bay\lulc_refdata.xls.  Get a copy of this file and take a look at it.  The file contains over 800 points and each have been assigned to a Landuse-Landcover code (see Table 19.1, page 561 of your textbook).  Note that, in this table, I have come up with a simplified set of LULC codes.  The Anderson Level II codes in this table range from 11 to 91.  I have collapsed these down into a total of 11 classes as follows:

2006 Modified Anderson LII LULC codes

 

 

 

Anderson Level II LULC Codes

2006 NEW: Modified LULC Codes

Class description

0

0

Background

11

1

Residential

 12 - 17

2

Urban or Built up lands

211

3

Ag. Pasture/Grass

212-24

4

Crops

31-33

 

Does not occur here

40

5

"Recent" clearcuts; anything cut since 1972

41, 43, 61

6

Deciduous forest

42

7

Conifer forest

51-54

8

Water

71-77

9

Soil/rock

81-85

10

Alpine veg., non-forest

91, 92

11

Snow/ice

We want to use half of the data to aid in assigning spectral classes to information classes (a “training” data set) and use the other half of the data to perform an accuracy assessment (a “test” data set).  You will begin by randomly spitting your data into training and test data sets.

 

Randomly split your data in excel:  Open lulc_refdata.xls.  Create a copy of the “lulc_refdata” worksheet.  On this copy, we will add a column of uniform random numbers.  In cell G2, enter the function “=RAND()”.  This will generate a uniform random number; this will be a value between 0 and 1.  Label this column something like “Random Function.”  Now copy this function to all other lines in the dataset.  This RAND function is a bit odd in that everytime you do anything to the worksheet, it regenerates a new set of random numbers for each cell containing this function.  We want to “lock” our random numbers so, copy the column containing this function and use edit-paste special-values (in Excell2007, this is found under Home-Paste-Paste Special), to paste these random numbers (not the rand() function) into a new column that you might label something like”random values.”  Now sort the data based on this column.  Roughly half of the data will have random values <0.5 and half will have values >0.5.  Take all lines with values <0.5 and copy this to a new worksheet that you should label something like “Training Data.”  The lines with values >0.5 should be copied to a new worksheet that you should label something like “Test Data.”  Now save the training and test data to separate tab delimited .txt files.

Cross Tabulation; An Objective Approach for Relating Information Classes and Spectral Classes: We will now use the Accuracy Assessment function of ERDAS to cross tabulate your spectral classes and Information classes.  At this point, don’t get hung up on the fact that this is called the “Accuracy Assessment Function.”   For now, just keep telling yourself that you are doing a Crosstabulation of your 50 spectral classes against your Information Classes (the Modified LULC Codes in the training data file;  These will be your “reference data.”).

Go to the Classifier button on the ERDAS Imagine main menu, then go to Accuracy Assessment (remember, you are really doing a Crosstabulation at this point).  This brings up the Accuracy Assessment Dialog box.  Go to File-Open in this box and select your bakerbay2005cl50.img image.  Then go to Edit-Import User-defined Points

importpts

This will enable you to pull in our reference data set.  The first step is to import the coordinates of these points.  Start by selecting your traindata.txt  file.  We then need to select the appropriate fields in this file.  The software has no way of knowing which fields in this file represent the X and Y coordinates of your reference sites. 

impoptions

 

In the Import Options dialog box, select a Separator Character of Tab.  This indicates that each field in your data set is separated by a Tab.  Select a Row Terminator Character  of Return newline(DOS) and the Number of Rows to Skip should be set to 1.  The first line in the file is a set of column labels and we want to skip this.  Now click on the Input Preview tab to view the file.  The fields in this file are as follows:

Field Number

Field Contents

1

Year data collected

2

Reference site ID#

3

Northing (meters)

4

Easting (meters)

5

Anderson Level II Code

6

Modified LULC Code

In the Column Mapping window (bottom half of the Import Options Dialog), we need to specify the fields that contain the X and Y coordinates for each Reference site.  Careful here!  The “Northing” represents the Y coordinate and the “Easting” represents the X coordinate.  It is very easy to reverse these and if you do, NONE of the reference points will be located within the bounds of our image!  Plug the appropriate field numbers into the Column Mapping window, then click on ok.  At this point, the X and Y columns in your Accuracy Assessment window should fill with values.

Now we need to import the Modified LULC Codes into the Reference column of the Accuracy Assessment window.  To do this left-click on the Reference column heading label to highlight this entire column.  Then right-click the Reference column heading label to bring up the following selections:

refimport

Select Import.  Then select your traindata.txt file again.  Then select options:

impcol

This brings up the Import Column Options dialog box:

importcolop 

As we did for the coordinates, we need to specify which part of this file we want to import.  Fill in this dialog box as indicated above, then click ok in this box and also click ok in the Import Column Data box.  At this point, the Reference column in your Accuracy Assessment should fill in with values.  Now we need to read the same values from our bakerbay2005cl50.img file.  To do this, go to Edit-show class values in the Accuracy Assessment dialog.  As soon as you do this, the Class column in the Accuracy Assessment dialog should fill in with values.

editshow

Just for fun, it might be useful to take a look at the distribution of our reference (training) sites in our image.  To do this, go to View-Select Viewer in the Accuracy Assessment dialog

viewsel

 

Then left-click once in the viewer that is displaying your spectral class image.  Then go to View-Show all.  At this point all of the reference points, with ID#s, will show up in the viewer.  You may need to go to View-change color, to select a color that contrasts with the color scheme of your image.  This image might be a good one to include in your lab report.

Finally, are ready to do the Accuracy Assessment (again, think Crosstabulation).  To do this, go to Report-Accuracy Report in the Accuracy Assessment dialog box.

accrep

Save this Accuracy Report (Crosstabulation) to a .txt file and bring it into Excel.  You will need to create a table that looks like this:

 

Information Classes

Spectral Class

1

2

…….

11

1

 

 

 

 

2

 

 

 

 

……

 

 

 

 

50

 

 

 

 

This table presents the distribution of our reference sites among information classes and spectral classes.  Why is this useful?  This table provides the basis for assigning spectral classes to information classes.  You will find that some spectral classes are only associated with a single information class.  In other cases, a spectral class may usually be associated with one information class.  In a few cases, you may find that a single spectral class may be more or less evenly split among several information classes.  Pretty handy isn’t it?  OK so use this information as a guide to assign your spectral classes to information classes.  Don’t take the results of this crosstabulation as absolute truth.  Your training data set is not perfect.  In some cases, you will need to ignore the crosstabulation data and make assignments on the basis of your knowledge of the study area and/or on the basis of context or other clues.  Do the best that you can but don’t get too hung up on this step.  Your classification will not be perfect and you will have some problematic assignments.

Since we've created a large number of spectral classes, some of them will certainly fall under the same information class.  This can be easily dealt with by aggregating these spectral classes into the same information class.  For example, three spectral classes that all represent water, but one class might represent shallow water, another - deep water, and third - silty or turbulent water.  These could all be aggregated into a single “water” class.  A more problematic occurs when two or more information classes are represented by the same spectral class.  For example, spectral class that represents the silty water of the Nooksack may also represent the bare exposed soil and rock on the flanks of Mt. Baker.

 

Arranging the Raster Attribute Table: I find that the default setup of the Raster Attribute Table is a bit awkward.  The default setting is to have the columns in the table arranged as follows: Row-Histogram-Color-Red-Green-Blue-Opacity-Class Name.  Here is what each of these represent:

Row – This is the Spectral Class ID #.  These are arbitrary numbers.

Histogram – this is a count of the # of pixels in each spectral class

Color- this is the color assigned to each spectral class.   You will be editing this as you go.  Each color will represent an information class.  In most cases, there will be multiple spectral classes for each color/info class.

Red-Green-Blue-Opacity – This just tells you about the colors you have selected.  If you don’t like the default color palette that you get by clicking on the Color column cell, you could edit these values to generate custom colors.  I don’t recommend that you do this.

Class Name- You will be editing these cells by typing in Information Class names (ex. Forest, water, snow, etc.)

I find that it is useful to rearrange the order of these columns.  To do this, go to Edit-Column Properties

editcol

This will bring up the following window:

coledit

In the Column subwindow, you will see the column headings in the default arrangement.  We want to change this.  The order I prefer is Value-Histogram-Color-Class Name.-Red-Green-Blue-Opacity.  To do this, click on Class Name, then reduce the Display width to 10, then Top.  This brings Class Name to the top of the list.  This should result in the Column Heading order we were wanted.  Now click on ok to close this window.

Don’t forget to save your Raster Attribute Table!  This will save the color scheme and your assignment of spectral classes to information classes.

 

Step 5: Accuracy Assessment: Now you are ready to do a real Accuracy Assessment using your testdata.txt file.  Recall that this is the half of the lulc_refdata.xls file that you DID NOT use in your Crosstabulation.  Before you can do the accuracy assessment you will first need to prepare a recoded version of your spectral class image. 

Recoding your classified image: Unfortunately, we will need to recode your classified image to conduct the accuracy assessment.  At present, the pixel values in your classified image correspond to the spectral class ID numbers.  We need an image that has pixel values that correspond to the Modified LULC Codes in the table above.  The easiest way to do this is to go to Interpreter-GIS Analysis-recode  interpreterbutton  interpbut2interpbut3.  Specify an input file (the file that contains your spectral classes) and an output file (this will contain your recoded Information classes).  Important: be sure to specify the Output Data Type as unsigned 4 bit.  You only are specifying 11 output classes so there is no need to use an unsigned 8 bit output file (the default).

recode

Then push the “Setup Recode button.  This brings up:

themrec  This is where you will do the actual recoding.  The “Value” column corresponds to your actual spectral class values.  The “New Value” column is where you can specify the  information class to which the spectral class should be assigned.  Use the Modified LULC Codes from the table above.  Click OK when you are finished.

 

 

You are now ready to do a real Accuracy Assessment.  Go to the Classifier button on the ERDAS Imagine main menu, then go to Accuracy Assessment.  This brings up the Accuracy Assessment Dialog box.  Go to File-Open in this box and select your recoded image.  Then go to Edit-Import User-defined Points

importpts

This will enable you to pull in our testdata.txt reference data set.  Now go through the same steps you followed above when doing the crosstabulation.  This time you will be comparing apples and apples.  The values in your recoded image represent our Modified LULC codes and so do the values in our Testing dataset.  The results of the Accuracy Report can be used to generate an Error Matrix.  You will want to save the accuracy report to a .txt file and bring it into Excel to manipulate it a bit and prepare your own error matrix.  Refer to the section of your book that discusses classification accuracy and the Error Matrix.  In particular, you will want to read about Producer’s vs. User’s Accuracy.  You will also want to read about Cohen’s Kappa.

After looking at your results, you may choose to go back and tweak your classification a bit to reassign some of your spectral classes to different information classes.    Don’t go overboard with this!  A classification accuracy over 70% is considered quite good.  Your accuracy may be lower than this. You may also decide to combine some of your information classes to achieve a higher accuracy.  In particular, you may choose to combine urban and rocks and/or combine deciduous and conifer forest. 

 

 

Step 6: Area measurements. In the Raster Attribute Editor for your recoded image, add an Area column as we have done for past labs, then export this to an ASCII file and bring it into Excel to so that you can quantify the area in each Information Class.  VERY IMPORTANT NOTE: The pixel size in this image is 25 meters by 25 meters. As you know, Landsat TM data normally has a pixel size of 30m by 30m but the raw TM data was resampled to create this Baker-Bay image.

 

 

Step 7: Knowledge Engineer: Go to Classifier-Knowledge Engineer.  This is a rule-based classifier that allows you to use non-spectral data to refine your classification.  Detailed instructions for using this feature of ERDAS are found in the “Tour Guides.”  Go to ERDAS Imagine 9.1-Online Manuals-Tour Guides.  Go to the Index and find the chapter that deals with the Knowledge Engineer and follow the instructions.

In saldata, you will find a version of the 2000 image that includes elevation and slope, in addition to the TM data (bbay2000slp_ele.img).  Layer 7 in this file is %slope and Layer 8 is elevation in meters.  Note that the elevation of Mt. Baker is about 3000m so I had to use unsigned 16-bit layers to store this information.  This is why the file size is so large for bbay2000slp_ele.img.  Use the slope and elevation data from this file along with your classified image.  We have also created a roads layer for the study area.  Look in the “Baker-Bay Road_buffer” subdirectory.  There are two files, one that buffers out 50m from all roads and the other buffers out 100m from all roads.  Note that these are ArcGIS GRID files.  When you bring these into the Knowledge Engineer, you need to specify a file type of GRID in order to see these files.  Use the GRID files in the “Baker-Bay Road_buffer” subdirectory.    Experiment a bit with this to see if you can separate Urban from Soil/Rock using elevation.  You can probably also separate water from shadow using slope.  Urban can be separated from rocks using either elevation or roads.  Experiment and don’t be afraid to be creative.

 

Step 8: Lab Report. Write a lab report. Tell us which spectral classes (and how many) you combined to form each information class and why. Are some information classes more variable than others? Why? Do you think some of your information classes could be further subdivided with some additional work? What additional information might you need to do this? How many information classes did you end up with? Include all the necessary images and tables. How much area is in each information class?

Several people have expressed confusion about what should be included in your lab report.  This lab has involved quite a bit of work and it will need to be somewhat longer than the lab reports you have written earlier in the quarter.  Your classification was really done in three stages and you should summarize your methods, results and implications/conclusions from each of these two stages.  These stages were:

1. Assignment of spectral classes to information classes based on  reference data.  You split your reference data into a training set and a test set.  You prepared a cross tabulation of your 50 spectral classes and your 11 reference classes and used this to objectively assign spectral classes to information classes.  You then used your test set to do an accuracy assessment.

2. Finally, you used the Knowledge Engineer to refine your map.  This probably enabled you to expand the number of classes in your map.  Do an accuracy assessment of this map using the same test set that you used previously.

For each of these 2 approaches, your report should include:
a) your best map of the study area;
b) an accuracy assessment, probably at several levels (you started doing your accuracy assessment for 11 classes but may have decided to collapse your map down to perhaps 6 classes; how did the accuracy change as you did this?);
c) a table showing the % of the study area in each cover type.

As part of your discussion, you should describe how your results changed with each new approach.  In addition you your quantitative accuracy assessment, you should also do a visual, subjective assessment.  What were the obvious flaws at each stage?  How and why did each stage of your analysis improve your result?  What would your next step be if you wanted to refine this even further? 

 

 

Orginally Created by N. Antonova and D. Wallin; 11/1998



Return to ESCI 442/542 Lab Page

Return to ESCI 442/542 Syllabus

Return to David Wallin's Home Page