UNIVERSIDAD POLIT?CNICA DE MADRID ESCUELA T?CNICA SUPERIOR DE INGENIEROS DE MONTES
OBJECT-ORIENTED ANALYSIS OF REMOTE SENSING IMAGES FOR LAND COVER MAPPING: CONCEPTUAL FOUNDATIONS
AND A SEGMENTATION METHOD TO DERIVE A BASELINE PARTITION FOR CLASSIFICATION1
GUILLERMO CASTILLA CASTELLANO Ingeniero de Montes
To be cited as Castilla, G. (2003) Object-oriented analysis of Remote Sensing images for land cover mapping: conceptual foundations and a segmentation method to derive a baseline partition for classification. Unpublished Ph.D. Thesis. Polythecnic University of Madrid. Madrid,Spain.
ESCUELA T?CNICA SUPERIOR DE INGENIEROS DE MONTES Departamento de Economía y Gestión de las Explotaciones e Industrias Forestales
OBJECT-ORIENTED ANALYSIS OF REMOTE SENSING IMAGES FOR LAND COVER MAPPING: CONCEPTUAL FOUNDATIONS AND A SEGMENTATION METHOD TO DERIVE A BASELINE PARTITION FOR CLASSIFICATION
Guillermo Castilla Castellano Ingeniero de Montes
Directores Joaquín Solana Gutiérrez Doctor Ingeniero de Montes Agustín Lobo Aleu Doctor en Biología 2003
TABLE OF CONTENTS
PREFACE ................................................................................................................................ VI ACKNOWLEDGMENTS..................................................................................................... VIII RESUMEN............................................................................................................................... XI ABSTRACT ....................................................................................................................... XXIII CHAPTER 1: Introduction......................................................................................................... 1 1.1. Introduction .................................................................................................................... 2 1.2. Thesis overview.............................................................................................................. 4 1.3. Landcover mapping: from paper to GIS ........................................................................ 7 1.4. Landscape models, landcover maps and the reality behind ........................................... 8 1.5. The issue of scale ......................................................................................................... 10 1.6. The role of Earth Observation Satellites in landcover mapping .................................. 12 1.7. The information content on landcover of EO data....................................................... 14 1.8. The impact of resolution on the information content................................................... 20 1.9. An example: information on floristic composition from EO data ............................... 23 1.10. Analysing EO data to derive information on landcover............................................. 27 1.11. A critique to pixel-based image classification ........................................................... 29 1.12. The quest for object orientation in Remote Sensing ................................................... 38 1.13. Motivation, objectives and main contributions .......................................................... 48 CHAPTER 2: Conceptual Foundations.................................................................................... 52 2.1. Introduction and overview ............................................................................................ 53 2.2. Assumptions, concepts and definitions ......................................................................... 54 2.2.1. Perceptual constructivism ...................................................................................... 54 2.2.2. Common sense realism........................................................................................... 55 2.2.3. Prototipicality of categories ................................................................................... 55 2.2.4. The basic level of cognitive categorisation ............................................................ 56 2.2.5. Taxonomies and Partonomies ................................................................................ 57 2.2.6. Granular partitions.................................................................................................. 58 2.2.7. Geographic objects................................................................................................. 61 2.2.8. Sorites vagueness ................................................................................................... 64 2.2.9. Supervaluationism.................................................................................................. 65 2.2.10. The egg-yolk representation of regions with vague boundaries .......................... 66 2.2.11. The epsilon-band model of positional boundary error ......................................... 66 2.2.12. Rough projection .................................................................................................. 67 2.2.13. Rough location ..................................................................................................... 68 2.2.14. Rough zonation .................................................................................................... 69 2.2.15. Junctions and arcs................................................................................................. 70 2.2.16. Class attributes ..................................................................................................... 70 2.2.17. Class definitions ................................................................................................... 71 2.2.18. Measurement disks............................................................................................... 71 2.2.19. Attribute measurement ......................................................................................... 72 2.2.20. Geographic fields ................................................................................................. 73 2.2.21. Object demarcation via field thresholding ........................................................... 74 2.2.22. Spatial homogeneity............................................................................................. 74 2.2.23. Admissible disk size............................................................................................. 76 2.2.24. Gaps and islands................................................................................................... 77 2.2.25. Minimum mapping unit (MMU) .......................................................................... 78 2.2.26. The Modifiable Areal Unit Problem (MAUP) ..................................................... 79 2.2.27. Mosaics, conglomerates and facets ...................................................................... 81 IV
2.2.28. Geographic models and the first law of Geography............................................. 82 2.2.29. Object demarcation via Thom’s morphology....................................................... 84 2.3. Model background: Frank’s five-tier ontology ............................................................. 86 2.4. Model motivation and overview ................................................................................... 88 2.5. The Idealistic (I-) model................................................................................................ 91 2.5.1. Field tier ................................................................................................................. 91 2.5.2. Object tier............................................................................................................... 92 184.108.40.206. Point-wise field classification ......................................................................... 92 220.127.116.11. Field segmentation .......................................................................................... 94 2.5.3. I-model summary ................................................................................................... 96 2.6. The realistic (R-) model ................................................................................................ 97 2.6.1. Field tier ................................................................................................................. 97 2.6.2. Object tier............................................................................................................. 101 18.104.22.168. Image segmentation....................................................................................... 102 22.214.171.124.1. The first partition as the primal sketch of the structure of the image..... 103 126.96.36.199.2. The size constraint.................................................................................. 104 188.8.131.52.3. The final partition as the baseline to object-oriented classification ....... 106 184.108.40.206.4. Granules ................................................................................................. 109 220.127.116.11. Granule classification.................................................................................... 111 2.6.3. R-model summary ................................................................................................ 115 2.6.4. Further considerations on the R-model ................................................................ 117 2.7. Class-concepts and the inadequacy of the spectrometric approach ............................ 119 CHAPTER 3: The baseline method ....................................................................................... 125 3.1. Introduction ................................................................................................................. 126 3.2. Method overview......................................................................................................... 127 3.3. Method evaluation....................................................................................................... 130 3.4 The normalized vector distance (NVD) ....................................................................... 131 3.5. Gradient magnitude image .......................................................................................... 132 3.6. Image smoothing ......................................................................................................... 133 3.7. Watershed partition ..................................................................................................... 138 3.8 Region merging ............................................................................................................ 145 3.9 Vectorization ................................................................................................................ 149 3.10. Examples ................................................................................................................... 150 3.11. Discussion ................................................................................................................. 162 CHAPTER 4: Conclusions..................................................................................................... 168 4.1. Conclusions ................................................................................................................. 169 4.2. Future work ................................................................................................................. 172 4.2.1. Refinement of the R-model .................................................................................. 172 4.2.2. Improvement of the baseline method ................................................................... 173 4.2.3. Classification of granules ..................................................................................... 174 REFERENCES................................................................................................................... 175 APPENDIX 1. The hierarchical organisation of the Universe.......................................... 186 APPENDIX 2. The concept of information ...................................................................... 192 APPENDIX 3. Resolution-limited representations of geographic space........................... 197 APPENDIX 4. The Forest Map of Spain (MFE) ............................................................... 199 APPENDIX 5. List of Acronyms....................................................................................... 203
"Individuals who break through by proposing a new paradigm are almost always...either very young or very new to the field whose paradigm they change....These are the men who, being little committed by prior practice to the traditional rules of normal science, are particularly likely to see that those rules no longer define a playable game and to conceive another set that can replace them." -- Thomas S. Kuhn in The Structure of Scientific Revolutions (1962).
Although inspiring, the above quotation is by no means applicable to myself. However, it keeps some resemblance with the circumstances that gave birth to this thesis. After completing in 1990 my studies in Forest Engineering at the Polytechnic University of Madrid, I began the doctoral courses hoping that I would get some financial support to undertake a Ph.D. Thesis. Since unfortunately this was not the case, I completed the courses and took another path that got myself involved in development aid for several years. During all that period I had with me this unsatisfied academic zeal. Vg. while in Mozambique, I applied for a Ph D. program at CATIE in Costa Rica and even was accepted, but being a national of a developed country, I was suggested to get my own funding... The path to this thesis began in Málaga in Christmas 1997. I was in a dinner with some old university fellows when I received a proposal from the Forest Map of Spain (MFE) team, with whom I had collaborated in the past, to apply for an ESA/CDTI grant: I would finally have data and time to write my Ph.D. Thesis and they would get in turn a method to update the Map with satellite imagery. I applied for it and some months later, while working in Guatemala, I was informed that I had been granted by the Spanish Ministry of Education to stay for two years as a trainee in ESRIN, the European Space Agency (ESA) Earth Observation data handling and exploitation centre, located in Frascati, Italy. I quitted my overseas job and started my stay in ESRIN in December 1998.
By the time I arrived there, I hardly had some bare notions about Remote Sensing, so it took me several months to familiarize with this field. But as soon as I got my first Landsat image and could overlay the MFE coverage on it, in observing that most polygons were distinguishable from their neighbours, I realized a somehow na?ve thought that became the motto of this research: ‘If the eye can see the difference, the problem is not in the data but in the analysis’. Such trivial conclusion, derived from the poor results I obtained both from supervised and unsupervised classifications, led me to embark on an exciting inquiry that for the time being has ended up in the present document. Unfortunately, my ESRIN traineeship came to an end before I could manage to put all the pieces of knowledge I had gathered into a workable scheme. I retired to my hometown Granada in order to carry on with the thesis, with the only sponsorship of my parents, who generously provided me with food and shelter, and of some temporary jobs I got with the help of friends and colleagues, which allowed me to retain my self-esteem. Albeit such voluntary seclusion has been hard to bear, it gave me a total intellectual freedom without which I probably would have never written a thesis like this. However, I was never in isolation during that period. I was connected to the world through a thin ADSL wire that allowed me to keep in touch with colleagues and to search the internet for the most diverse documents. Indeed, this thesis is a product of the internet era, for it gave me the wings I needed in order to collect the nectar of scattered ideas. Having dissolved them into a loosely coupled whole –this thesis, now it is time for others to transform it into a richer nourishment for the progress of this field.
Granada, Spain, December 2002
During the long four years that took this thesis to complete, I received the help and support of many people and institutions. First of all, I am particularly indebted to the MFE team, especially to Juan Ignacio García Vi?as and Professor Juan Ruiz de la Torre. The former embarked me in this exciting enterprise and gladly helped me in numerous tasks, including a visit to the study sites, the arrangement of some presentations, and networking with some individual and institutions. The latter offered me a generously paid consultancy job when, some months after coming back to Spain from Italy and running out of money, I was about to give up and resume my activity overseas as an aid worker. They also supported my application to the ESA traineeship, which turned out to be decisive for my acceptance. I am also thankful to the Spanish Ministry of Education and the ‘Centro para el Desarrollo Tecnológico Industrial’ (CDTI), particularly to Mr. Manuel Serrano, for the concession of a grant from the ‘subprograma de becas de especialización en organismos internacionales’, which allowed me to stay for two years at ESRIN as a trainee. There at Frascati there is a great bunch of people to whom I am indebted. I whish to thank my successive tutors at ESRIN, Marc Gorman, Luigi Fusco and Olivier Arino for providing me with all the data and facilities I needed. Mr. Juerg Lichtenegger, although not his duty, acted as a mentor who guided the first steps of this research and not only, he was always available and even lent me his personal laptop when I suspended my traineeship during a month to travel to Mozambique to help a local NGO after the flooding of March 2000. Massimo Barbieri and Simone Paoloni kindly helped me in many occasions, mostly related to the processing of images requiring a specialised software, but not only. Maria Cristina Guarrera and Monica Rollán helped me with the gathering of papers even after having returned to Spain. And many other friends gave me a hand and counsel perhaps more often than they wished, to mention a few: Daniel Esteban, Marcello Mariucci, Nick Walker, Josep Closa, Lucio Castellano, Carlos Linares, Diego Fernández, and many others...
Out of ESRIN, I wish to express my gratitude to Dr. H?kan Olsson, who gave me the opportunity to visit Sweden and meet his team at the SLU RS lab in Ume?. I am also thankful to Paul Smits, Olle Hagner, Mario Chica-Olmo and Javier Martínez-Millán for helpful comments. Many thanks to my supervisors, Dr. Joaquin Solana, who gently urged me to stop digressing and to write this otherwise never ending story, bore my long explanations, cleared the bureaucratic laberynth, and even helped me to get a temporary academic job; and Dr. Agustin Lobo, for having stood up all this period even after the refusal of the funding proposal that was supposed to back up our relationship, for his innumerable sharp comments, for proofreading drafts, and for bearing my stubbornness and reluctance to be guided. I am also indebted to my tutor during my university years, Dr. Eugenio Martínez-Falero, for many things, but particularly for having managed to get for me the extensions I should have requested while I was working overseas in order to maintain open the possibility of completing the Ph.D. degree. Special thanks to Reyes Ruiz for his friendship, encouragement and support, without which the Spanish period of this research would had been much harder. Other best friends have being cheering me up, specially in that final 10% that takes 90% of the time, among them Ramon Santiago. Ending up this list with the most beloved persons, I would like to thank my parents and siblings, for their loving and material support, and for their enormous patience. I still remember telling them shortly after coming back from Italy that it would only take me a few months... My final thanks go to my girlfriend, Victoria, who in addition to love and care has given me the emotional balance I needed in order to face this personal challenge.
jamás hubiese visto la luz
RESUMEN__________________________________________________________________ “ANALISIS ORIENTADO A OBJETOS DE IM?GENES DE TELEDETECCI?N PARA CARTOGRAFIA FORESTAL: BASES CONCEPTUALES Y UN METODO DE SEGMENTACION PARA OBTENER UNA PARTICION INICIAL PARA LA CLASIFICACION”
RESUMEN La toma de conciencia de la opinión pública sobre el medio ambiente, cuyo inicio se puede situar después de la Conferencia de Estocolmo de 1972 y su madurez tras la Cumbre de Río de 1992, ha dado lugar a la creación de ministerios y agencias regionales dedicados a su gestión y protección en la mayoría de los países. Estas instituciones precisan para desempe?ar eficazmente su labor de información detallada y al día sobre aspectos relevantes del territorio que controlan. En consecuencia, la demanda de información geográfica ha crecido sustancialmente en las últimas décadas, a un ritmo quizá sólo superior al del desarrollo de las tecnologías que la hacen posible y accesible a los ciudadanos. Con base en esa información geográfica, las autoridades toman decisiones, y a través de ella el público se forma una opinión que a su vez influye en éstas. Por tanto la calidad de la información, en términos de extensión, detalle y frescura, es un requisito ineludible para una sociedad pudiente preocupada por su medio natural. Y en un país como Espa?a, donde cada a?o se construyen numerosas infraestructuras, se producen cambios de uso de suelo de rústico a urbano y de agrícola o improductivo a forestal, y donde las cubiertas vegetales pueden cambiar drásticamente a causa de los incendios, la necesidad de disponer de información geográfica actualizada es aún más patente. De entre esa información, uno de los temas más importantes es la distribución espacial de los diferentes tipos de cubierta existentes, pues en función de su naturaleza y estado se ordenan las actuaciones y normativas en cada zona concreta. En las áreas naturales (entendidas como no agrícolas ni urbanas), la vegetación –o su ausencia- es el factor primordial que define no sólo los tipos de cubierta, sino las otras características de los ecosistemas que constituye. En Espa?a se denominan forestales a todas las zonas cubiertas por vegetación natural aunque consistan en formaciones no boscosas, por lo que en adelante se hablará de cartografía y de mapas forestales, cuya actualización es el problema de fondo que esta tesis aborda.
RESUMEN__________________________________________________________________ La información sobre las zonas forestales tradicionalmente se ha obtenido por medio de recorridos de campo e inventarios que recogen datos que luego son resumidos en una base de datos geográfica. Hasta hace unos pocos a?os, esa base de datos se presentaba de forma visual y escueta como un dibujo simbólico en una hoja grande de papel: el mapa forestal. Los mapas tratan de hacer observables a una escala háptica (esto es, distinguibles a simple vista a una distancia inferior a la que alcanza la mano) los aspectos de interés de un territorio relativamente extenso. Lo que vemos por ejemplo en un mapa topográfico 1:50.000 de una zona dada sosteniéndolo con los brazos extendidos se podría aproximar a lo que veríamos a través de un marco de las mismas dimensiones colocado horizontalmente y sostenido de igual forma a unos 25.000 m. de altitud sobre esa zona, con la diferencia de que en el mapa los elementos de interés aparecen resaltados y/o exagerados, como p.ej. las carreteras. Los mapas forestales usan normalmente como fondo un mapa topográfico sobre el que se a?aden una serie de recintos contiguos o teselas que corresponden a zonas relativamente homogéneas respecto al tipo de cubierta vegetal. Sobre cada tesela se dan una serie de informaciones visuales a través de colores, tramas o sobrecargas geométricas, símbolos y letras. Generalmente se acompa?a el mapa de una memoria en la que se exponen otra serie de informaciones que por su carácter excesivamente detallado (o bien general) no pueden ser representadas en el mapa. La escala empleada depende del objetivo del mapa, de las restricciones presupuestarias , y de la base cartográfica disponible, que típicamente oscila de 1:10.000 (mapas a nivel local) a 1:1.000.000 (a nivel nacional). El desarrollo de la fotogrametría a partir de la Primera Guerra Mundial supuso un notable avance para la cartografía forestal, que dejó de basarse en croquis y mediciones sobre el terreno para apoyarse en fotos aéreas, técnica aún vigente hoy en día. Las teselas se delinean manualmente sobre las fotos y posteriormente se pasan al mapa con la ayuda de un restituidor. La fotointerpretación consiste en la delimitación e identificación de regiones homogéneas en la imagen, y se basa en las diferencias visuales que produce cada tipo de cubierta. Los caracteres que se usan a este fin son el color o el tono, la textura, el tama?o, forma y patrón de distribución de los objetos, y el contexto. Todos esos elementos tomados en su conjunto permiten al fotointérprete establecer un diagnóstico sobre el tipo de cubierta presente en cada tesela, lo que posibilita cartografiar la vegetación con relativamente poco trabajo de campo. Este consiste normalmente en la inspección de algunas teselas según un determinado muestreo, el cual se realiza normalmente en dos etapas: una primera para la propia XII
RESUMEN__________________________________________________________________ elaboración del mapa, y otra posterior para estimar su precisión (frecuencia de errores de clasificación y/o delineación). El proceso de producción de cartografía forestal descrito arriba es demasiado lento y costoso como para permitir una frecuencia de actualización similar a la de los cambios en el territorio, siendo el periodo entre renovaciones tipicamente superior a diez a?os. Este método requiere una cantidad considerable de personal especializado, y posiblemente un vuelo fotogramétrico ad hoc; además presenta el problema de la subjetividad del fotointérprete a la hora de trazar los bordes de las teselas, lo que en futuras revisiones muchas veces dará lugar a correcciones aun sin haberse producido cambios. Sin embargo, los avances en las dos últimas décadas en la informática, que han disminuido drasticamente el coste de almacenamiento de datos e incrementado espectacularmente la capacidad de procesamiento; en telecomunicaciones, que permiten un intercambio masivo e instantáneo de información entre cualquier parte del mundo; y en teledetección, que han aumentado continuamente la resolución espacial, espectral y temporal de los datos recogidos por los satélites de observación de la Tierra, han cambiado totalmente el panorama. Hoy en día los mapas forestales ya no son unicamente una serie de informaciones visuales presentadas en papel. Están integrados en un Sistema informatizado de Información Geográfica (SIG) que almacena y explota grandes bases de datos geográficos estructuradas en capas temáticas, las cuales pueden ser combinadas facilmente para adaptarse a las necesidades del usuario. Los SIG posibilitan la cartografía automática de los resultados de un determinado análisis estadístico, y la reproducción de la información, bien en la pantalla o impresa, a cualquier escala y con diferentes presentaciones. La expansión comercial del software de SIG, cada vez más versátil y económico, unido a la creciente difusión de internet, permiten en la actualidad a los ciudadanos acceder con facilidad a toda clase de informaciones sobre su territorio, con lo que la demanda de este tipo de información está creciendo exponencialmente. Los avances mencionados han llevado a un mayor uso de métodos automáticos de análisis de imágenes de satélite para la elaboración de cartografía temática. Sin embargo, la calidad de los mapas resultantes está por debajo de los estándares exigidos por las instituciones que los encargan. En consecuencia, la mayoría de los proyectos de cartografía aún se apoyan en cierta medida en la fotointerpretación. No obstante, ésta técnica no sólo es menos eficiente, sino que XIII
RESUMEN__________________________________________________________________ conlleva un alto grado de subjetividad. Por tanto, las conclusiones sobre el cambio de tipo de cubierta derivadas de la comparación de sucesivas actualizaciones realizadas por este método son poco fiables. Considerando el mayor énfasis que actualmente se da al seguimiento de los cambios en el territorio, hay una mayor necesidad de mejorar y agilizar el modelado del paisaje. Para ello se requieren métodos automáticos, sólidamente fundados y de mayor precisión, que constituyan la base operativa sobre la que mantener actualizada la información geográfica que orienta las actuaciones de las agencias territoriales de medio ambiente. Por otro lado, el enfoque comúnmente usado para analizar las imágenes de satélite con fines cartográficos da lugar a resultados insatisfactorios debido principalmente a que unicamente utiliza los patrones espectrales de los píxeles, ignorando casi por completo la estructura espacial de la imagen. Este enfoque (denominado en esta tesis ‘espectrométrico’) se basa en la discriminación de firmas espectrales, las cuales están normalmente constituidas por los valores que adopta cada píxel en cada una de las bandas que constituyen la imagen multiespectral, las cuales se forman según se detalla a continuación. El sensor adquiere datos que son agrupados espacialmente en una matriz o ráster en la que cada celdilla corresponde a una medición de una se?al eléctrica que es función de la cantidad de radiación recibida en esa posición y momento. La medición es discretizada por un convertidor analógico-digital a una escala finita o rango dinámico (de 0 a 255 para la mayoría de los sensores ópticos), y el valor resultante es introducido en esa celdilla en forma de Número Digital (DN). La radiación incidente es separada antes de alcanzar los detectores del sensor por medio de un sistema de prismas y filtros, de forma que cada banda de una imagen multiespectral corresponde a la radiación capturada en un intervalo particular del espectro electromagnético. Los valores de cada celdilla, representados a lo largo del espectro, se pueden interpolar dando lugar a una curva o firma espectral, que aunque más grosera tiene una cierta similitud con la que se obtiene de los espectrómetros de sobremesa, de ahí que cada píxel se considere como una muestra individual. Estas firmas se pueden también representar como puntos de un espacio multivariante en el que cada eje ortogonal se refiere a una banda y está constituido por el conjunto ordenado de valores del rango dinámico. La clasificación espectrométrica de las imágenes consiste por tanto en delimitar las regiones de ese espacio asociadas a cada clase. Las clases deben satisfacer los siguientes requisitos: - Exhaustividad: debe haber una clase que asignar a cada píxel de la imagen. XIV
RESUMEN__________________________________________________________________ - Separatividad: las clases deben ser separables en el espacio multivariante con el clasificador empleado. - Utilidad: las clases deben cubrir las necesidades de información del usuario. El requisito de separabilidad implica que las firmas de clases diferentes estén relativamente distantes las unas de las otras, de forma que su grado de solape sea despreciable. Sin embargo para que esto se cumpla, cada cluster (nube de puntos) del espacio multivariante debe contener una clase mayoritaria. Por otro lado, la cuadrícula de terreno sobre la cual el sensor realiza la medida de cada píxel debe ser suficientemente grande como para incluir los elementos que producen la firma espectral típica de cada clase. Dicho de otro modo, el tama?o de la cuadrícula debe ser tal que un observador situado en el centro de ella tenga suficientes elementos de juicio como para asignar la clase correcta restringiendo la observación al interior de la cuadrícula. El tama?o mínimo necesario para realizar una clasificación correcta sobre el terreno recibe el nombre de resolución espacial de la clasificación. Por tanto, una premisa básica del enfoque espectrométrico es que la extensión de la cuadrícula sobre la que se extrae la muestra (es decir, el tama?o del píxel) supere esa resolución. Ahora bien, cuanto mayor sea el tama?o de la cuadrícula, mayor será el porcentaje de píxeles “mezclados”, es decir, píxeles incluyen bordes entre teselas. Como la firma espectral de éstos es una mezcla de las firmas típicas de las clases de esas teselas, su posición en el espacio multivariante corresponderá a las zonas que separan clusters de clases diferentes. Sin embargo, el propio concepto de cluster requiere que éste esté separado de otros por una discontinuidad, eso es, por una región del espacio multivariante casi vacía. Por tanto la premisa del tama?o suficiente y del solape despreciable no pueden ser satisfechas simultáneamente cuando las unidades analizadas son píxeles individuales, a no ser que se estudie un territorio relativamente simple (como un paisaje agrícola con grandes campos de cultivo) con unas clases de cubierta muy generales. A pesar de esto, el enfoque basado en píxeles sigue siendo el paradigma comunmente aceptado en esta disciplina. Para entender el porqué de esta situación, nos tenemos que remontar a los orígenes de la teledetección espacial allá por los a?os setenta. El tama?o de píxel con que ésta comenzó (80 m) era compatible con la resolución espacial de la mayor parte de las clasificaciones. A esa escala, era más natural considerar las clases de cubierta como materiales homogéneos distribuidos sobre el territorio en parcelas mayores que un píxel, por tanto era razonable XV
RESUMEN__________________________________________________________________ asimilar cada píxel individual a una muestra introducida en un espectrómetro que puede ser analizada por separado. Conforme el progreso técnico permitió mayores resoluciones, la variabilidad radiométrica de las clases aumentó, por tanto se hizo necesario por un lado incorporar al análisis alguna característica espacial como la textura que pudiera paliar esta heterogeneidad, y por otro realizar un pre y/o un post-procesamiento basado en filtros que redujese la inconsistencia espacial de las imágenes clasificadas. La aparición a principios de este siglo de satélites civiles de muy alta resolución (< 5m.) ha puesto de manifiesto las deficiencias del enfoque espectrométrico cuando no se cumple la premisa del tama?o. Además, la equiparación de las clases de cubierta a tipos de materiales homogéneos permite que cualquier parte arbitrariamente delimitada dentro de una tesela del mapa siga siendo un referente del concepto definido por su etiqueta. Esta posibilidad es incongruente con el modelo jerárquico del paisaje cada vez más aceptado en Ecología del Paisaje, que asume que la homogeneidad depende de la escala de observación y en cualquier caso es más semántica que biofísica, y que por tanto los paisajes son intrínsecamente heterogéneos y están compuestos de unidades (‘patches’) que funcionan simultaneamente como un todo diferente de lo que les rodea y como partes de un todo mayor. Por tanto se hace necesario un nuevo enfoque (orientado a objetos) que sea compatible con este modelo y en el que las unidades básicas del análisis sean delimitadas de acuerdo a la variación espacial del fenómeno estudiado. Esta tesis pretende contribuir a este cambio de paradigma en teledetección, y sus objetivos concretos son: 1) Poner de relieve las deficiencias del enfoque tradicionalmente empleado en la clasificación de imágenes de satélite. 2) Sentar las bases conceptuales de un enfoque alternativo basado en zonas básicas cartografiables. 3) Desarrollar e implementar una versión demostrativa de un método automático que convierte una imagen multiespectral en una capa vectorial formada por esas zonas. El modelo jerárquico concibe el paisaje como un mosaico de unidades funcionalesestructurales anidadas jerárquicamente, de forma que cada unidad se puede considerar compuesta de subunidades que interactúan más entre ellas que con subunidades de unidades vecinas, con lo que cada unidad forma un todo más o menos integrado, esto es, un objeto. A cada nivel superior de la jerarquía le corresponden objetos cada vez mayores: árbol, bosquete XVI
RESUMEN__________________________________________________________________ o rodal, bosque. Para cada nivel jerárquico se puede establecer un umbral de tama?o (la unidad mínima cartografiable) por debajo del cual se asume que no existen –o no interesanobjetos de ese nivel. Bajo este prisma, cada clase de cubierta es un concepto geográfico que se refiere a una serie de objetos de un nivel específico de esa jerarquía que tienen una estructura y funcionamiento similares. Por tanto una región arbitrariamente delimitada dentro de esos objetos no puede ser un referente de ese concepto, pues le faltan la unidad del todo y la diferencia con el resto (una parcela de inventariación dentro de un bosque no es un bosque). El hecho de que muchos de estos objetos geográficos tengan bordes difíciles de delimitar no implica que su interior sea ontológicamente dependiente del mapa, más bien son los bordes que los separan las creaciones humanas, y es la selección de los criterios objetivos para su delimitación lo que genera su dependencia del analista. Así, la multiplicidad de alternativas que existen a la hora de dibujar el límite de un bosque no refleja más que la vaguedad del concepto bosque, y esa vaguedad solo puede ser reducida a?adiendo a la definición del diccionario enunciados más precisos sobre lo que es un bosque bajo la óptica del mapa, que sean cuantificables con los medios disponibles para su elaboración. Otro problema relacionado es cualquier bosque, siendo un objeto complejo, contiene partes (p.ej. calveros) que si son aisladas de su entorno no pueden ser reconocidas como partes del bosque, por lo que el enfoque espectrométrico, que supone que el bosque es homogéneo en toda su extensión, presenta serias deficiencias a la hora de abordar la heterogeneidad intrínseca de los paisajes. Se hace por tanto necesario un nuevo enfoque que sea más acorde con el modo natural de interpretar el paisaje que tenemos los humanos, consistente en dividirlo en una serie de entidades discretas u objetos que son referentes de la serie de conceptos que usamos para dar sentido a lo que vemos. El enfoque alternativo que se sigue en esta tesis está basado en el análisis orientado a objetos, que trata de modelar el campo de estudio usando objetos que son instancias (ejemplos particulares) de clases, y que es especialmente adecuado para analizar el paisaje bajo el citado modelo jerárquico. Los objetos se agrupan en clases organizadas jerarquicamente que permiten a las clases inferiores heredar propiedades de las clases superiores de las que proceden, y esta estructura da lugar a una jerarquía de objetos en subobjetos y superobjetos que a su vez permite la encapsulación (ocultación) de la información. A la vista del fracaso del enfoque espectrométrico cuando se aplica a imágenes de los nuevos sensores de más alta resolución, el análisis orientado a objetos está adquiriendo cada vez más auge en XVII
RESUMEN__________________________________________________________________ teledetección, aunque de momento carece de una base teórica sólida específica a este campo y a su aplicación a la cartografía del paisaje. Esta tesis intenta sentar las bases conceptuales de este enfoque, proporcionando además un método automático de segmentación para obtener una partición inicial de la escena que sirva de base a la clasificación. La estrategia que se propone es producir, basándose en la estructura espacial de las imágenes, una partición de estas en la que cada región puede considerarse relativamente homogénea y diferente de sus vecinas, y que además supera (aunque no por mucho) el tama?o de la unidad mínima cartografiable. Estas regiones son las unidades básicas de la clasificación, sobre las cuales se pueden definir una serie de atributos espaciales (forma, tama?o, orientación), estructurales (disposición interna, tono, textura y contraste entre las diferentes partes que las componen) y contextuales (relaciones con regiones vecinas) que no son aplicables a píxeles individuales. Cada región se asume corresponde a un rodal que tras la clasificación será agregado junto a otros rodales vecinos en una región mayor que en conjunto pueda verse como una instancia de un cierto tipo de objetos que más tarde son representados en el mapa mediante teselas de una clase particular. Esta estrategia se basa en un modelo en tres niveles del territorio, en el que el último nivel es el mapa forestal en sí. El primer nivel está constituido por objetos solidarios a la superficie terrestre y no observables a las escalas de esos mapas, como árboles y edificios. El segundo nivel está formado por una serie de campos geográficos (variables regionalizadas continuas), que en la versión idealista del modelo corresponden una serie de atributos cuantitativos, uno por campo, derivados de la medición (en parcelas circulares centradas en cada punto del territorio) de alguna característica relacionada con las propiedades y/o distribución espacial de los objetos del primer nivel. En la versión realista del modelo no es posible obtener los anteriores campos por motivos económicos, y en su sustitución se recurre a otros campos geográficos, indirectamente relacionados con los atributos biofísicos usados para clasificar el paisaje, que son el conjunto de ortoimágenes de satélite disponibles. Finalmente, en el tercer nivel, las diferencias cuantitativas reflejadas por los campos son transformadas en diferencias cualitativas entre objetos, esto es, se pasa de una representación numérica del territorio, fácilmente manipulable por un ordenador, a una representación simbólica, mucho más asimilable y manejable por una persona.
RESUMEN__________________________________________________________________ El paso del segundo al tercer nivel se realiza en primera instancia por un proceso de
morfogénesis basado en la teoría de catástrofes, que da lugar a la delimitación de una serie de regiones primarias disjuntas que corresponden al área de influencia de cada atractor local (punto de mínima variación en una determinada región del campo). Las regiones primarias adyacentes se unen entre sí según el parecido de sus valores medios en los atributos considerados, y esta agregación tiene lugar hasta que las regiones resultantes alcanzan el tama?o mínimo que se les supone a los objetos que se pretenden clasificar. A partir de ese punto comienza la clasificación, en la que el usuario aporta su conocimiento sobre cada tipo de objeto que quiere resaltar en el territorio, y lo expresa p.ej. a través del rango de valores admisibles para cada clase y atributo, y adicionalmente, como un conjunto de reglas relacionales que pueden conducir al cambio de significado (etiqueta de clase) de un objeto según su contexto. Un posible método de clasificación es propuesto pero no desarrollado en la tesis. En la versión realista del modelo, el proceso de morfogénesis se simula mediante un filtrado difusivo no lineal (que elimina la textura y respeta los bordes) de la ortoimagen de satélite, seguido de una transformación geodésica en microcuencas (‘watersheds’) de la imagen conjunta de magnitud del gradiente, transformación que equipara esta última imagen con un modelo digital del terreno, y que delimita el área de influencia de cada mínimo local de gradiente. Para crear la imagen de gradiente, se define previamente una medida de la disimilitud radiométrica entre píxeles adyacentes de la ortoimagen multiespectral filtrada, de forma que los números digitales asociados a los píxeles de la imagen de gradiente representan el valor absoluto de la máxima variación en similitud existente en el entorno de cada píxel de la ortoimagen. Las ‘microcuencas’ son a continuación agregadas por adyacencia siguiendo un orden definido por su similaridad radiométrica, hasta que las regiones resultantes superan todas el tama?o de unidad mínima cartografiable. En cada iteración, se comienza uniendo los pares de regiones adyacentes que presentan la diferencia más baja, permitiéndose una única unión por región e iteración, e impidiéndose la unión cuando alguna de las dos regiones tiene un vecino más similar que el que se está evaluando o cuando alguna de las regiones colindantes ha sido ya unida a otra en la iteración en curso (ya que el nuevo agregado podría ser el vecino más similar de una de las dos regiones). De está manera, se forman primero regiones más homogéneas, a las que progresivamente se van incorporando regiones más peque?as y de más alto contraste con su XIX
RESUMEN__________________________________________________________________ alrededor. Así, la imagen ‘labelizada’ (en la que cada píxel tiene como DN el identificador numérico de la región a que pertenece, salvo los píxeles de borde, que tienen su DN=0) que representa el estado de la partición se va simplificando progresivamente durante la segmentación, hasta que todas las regiones superan el tama?o mínimo especificado. Además de constituir las unidades básicas propuestas para efectuar una clasificación orientada a objetos, las regiones resultantes de la segmentación forman un retículo que podría ser la base de un procedimiento de fotointerpretación asistida, en la que el/la intérprete tan sólo tiene que seleccionar y eliminar (con clics de ratón) los arcos irrelevantes, es decir aquellos que considere que separan regiones que para él/ella son lo mismo. Otra aplicación inmediata del método es la verificación/actualización de mapas forestales mediante la detección de zonas incongruentes (regiones básicas que presentan una apariencia diferente de las demás regiones circunscritas en la misma tesela). El modelo en tres niveles del territorio y el proceso para construirlo se basan en tres hipótesis. La primera, que permite pasar de la versión idealista a la realista, asume que la variación espacial conjunta de los campos geográficos ideales asociados a los atributos biofísicos de la cubierta coincide en su mayor parte con la variación conjunta de luminancia de las bandas que componen la ortoimagen multiespectral. Dicho de otra forma, la hipótesis de coincidencia presupone que los bordes que aparecen en la imagen de gradiente conllevan un cambio significativo en alguno de los atributos biofísicos relevantes para la clasificación de la cubierta de las regiones separadas por esos bordes. El siguiente paso para construir el modelo es reconocer que el único modo de abordar la heterogeneidad jerárquica del paisaje consiste en establecer un nivel de generalización por debajo del cual carece de interés conocer la estructura interna de las partes que componen las unidades que se pretenden identificar. Este reconocimiento se materializa en la hipótesis de tama?o: para que un rodal o mancha tenga interés y por tanto cabida en el modelo, este debe superar una cierta extensión mínima, que en principio se puede asimilar a la de la unidad mínima cartografiable. Por tanto, aquellas regiones primarias (derivadas de la morfogénesis) que no lleguen a ese tama?o tendrán que ser agregadas. Este último proceso es guiado por la hipótesis de correspondencia entre similitud radiométrica (espectral) y similitud semántica (taxonómica) en regiones adyacentes, que asume que si dos regiones vecinas tienen la misma apariencia en la imagen, entonces es muy probable que correspondan al mismo tipo de cubierta.
RESUMEN__________________________________________________________________ El método para obtener la partición inicial del territorio, que sirve de punto de partida para la clasificación, se ha implementado en un lenguaje de programación especialmente adecuado para procesar imágenes digitales (IDL) y se ha probado tanto en imágenes multiespectrales como en ortofotos pancromáticas aéreas. Los resultados preliminares se adaptan razonablemente bien a la estructura espacial de la imagen. Cada región definida se percibe como una unidad que presenta una cierta coherencia interna que además se ve diferente de lo que la rodea. Sin embargo, los resultados también revelan una serie de problemas: Poca consistencia, en áreas de bajo contraste, de los bordes resultantes de aplicar el método con diferentes parámetros del filtrado inicial. Aunque esta circunstancia era previsible, indica una falta de robustez del método en ausencia de un contraste claro entre regiones diferentes. Dependencia del resultado de la frecuencia de actualización física de la imagen ‘labelizada’, que no se efectúa en todas las iteraciones por razones de economía de cómputo. Este problema impide de momento que el método se pueda aplicar a imágenes grandes (mayores de 4 Mpixels) que requieran ser subdivididas en imágenes más peque?as para su procesamiento, ya que se producirían incongruencias a la hora de casar las subimágenes. Efectos fractales en los bordes resultantes, que se evidencian por que la longitud del perímetro de una región cualquiera crece indefinidamente al disminuir el tama?o del píxel, es decir, que los bordes de las regiones son tanto más complejos cuanto más alta es la resolución. Estos efectos pueden dificultar la vectorización de la partición. El trabajo futuro se puede centrar en tres áreas, relacionadas respectivamente con el modelo conceptual del paisaje, el método de segmentación, y la clasificación de las regiones resultantes de ésta. El modelo en tres niveles es susceptible de una definición más formal por medio de un conjunto lógico-matemático de axiomas, definiciones, propiedades y relaciones. Las deficiencias referidas del método de segmentación se pueden solventar con relativa facilidad. Por un lado, la significación radiométrica de los bordes de la partición inicial en microcuencas se puede evaluar a partir de su prominencia geodésica, y partir de ella se puede simplificar la partición inicial para que no incluya bordes débiles. Los problemas de actualización de la partición durante las iteraciones se pueden remediar si se convierte la partición de microcuencas a una capa vectorial de tipo polilínea en la que cada arco tiene un identificador relacionado con el de las regiones que separa además de una serie de atributos que registren el valor medio de los píxeles del arco en cada una de las bandas. Y los efectos fractales pueden ser mitigados si se ordenan los vértices en una jerarquía de escala similar a la XXI
RESUMEN__________________________________________________________________ de las regiones clasificadas, de forma en cada escala de visualización, tanto el número de regiones representadas como la densidad de bordes permanecen más o menos constantes. Por último, la clasificación de las regiones de partida requiere por un lado definir una serie de atributos sustitutivos de los biofísicos sobre los que apoyarla, y por otro, una serie de reglas heurísticas para efectuarla. Una posibilidad es aplicar el método ELECTRE de ayuda a la toma de decisiones multicriterio. Primero, se estima el rango de valores admisibles de cada atributo en cada clase, para lo cual se requiere información fidedigna de inspecciones sobre el terreno o de fotografía aérea detallada, contrastada por la información del mapa que se pretende actualizar. Después, para cada región i se compara el valor que toman en ella los atributos con el rango de valores de cada clase c. Entonces el conjunto de atributos es dividido en tres partes, según el valor de estos sea concordante, discordante o indiferente con la proposición ‘la región i es una instancia (ejemplo) de la clase c’. La indiferencia surge cuando hay un intervalo de valores dentro del cual el atributo puede tanto apoyar como negar la proposición, o cuando el dato falta o es inconmensurable con la clase en cuestión (algunos atributos pueden ser aplicables a unas clases y no a otras). Con base en esta división, se calcula un índice de verosimilitud para cada clase, de forma que para cada región se van descartando las clases menos verosímiles. El proceso se realiza en varias iteraciones, ya que el estado de regiones vecinas puede afectar el índice de las clases de la región en cuestión. El procedimiento termina cuando la mayoría de las regiones presenta una clase mucho más verosímil que las demás que además se mantiene estable. Las regiones que no han alcanzado este estado se marcarían para su inspección bien por un operario en pantalla o en el terreno si el anterior no puede resolver la duda, y tras la comprobación, las nuevas teselas del mapa actualizado se definirían como conjuntos máximos de regiones conectadas de la misma clase.
In the last decades, there has been a trend towards automated landcover mapping. Nevertheless, the overall accuracy of the maps produced in this way is normally below the user’s requirements. Consequently most landcover mapping projects still rely to some extent on photointerpretation, which is less cost-effective and more subjective than the former. As the focus is increasingly shifting to monitoring rather than simple map making, there is a need to improve quantification and modelling. But this need cannot be fulfilled with traditional automated methods. The aim of this thesis is to contribute to an ongoing effort aimed at solving this dilemma by integrating into the analysis the principles of object-oriented modelling.
The traditional approach to the analysis of satellite images for landcover mapping, which is mostly based on the classification of signatures (multivariate samples drawn from pixels), leads to unsatisfactory results mainly because it hardly exploits the spatial structure of the images. In addition, this approach conceives the territory as made of distinct homogeneous surface materials, and hence cannot account for the pervading hierarchical heterogeneity of landscapes. Furthermore, the approach is in many respects incompatible with the hierarchical patch model underlying modern Landscape Ecology. The latter conceives classes as referring to types of geographic objects (patches) of a particular scale. These objects are complex compounds of objects attached to the Earth –like trees and buildings, and can be viewed as nested integrated wholes that differ somewhat from their surroundings. In being complex and nested, they are inherently heterogeneous, hence they can hardly be analysed as homogeneous materials at a single scale of observation.
The alternative approach that will be followed here is to derive, based upon the spatial structure of the image, a fine partition of the scene, in which each region exceeds the minimum size required for the geographic objects nested at the hierarchic level of interest. This partition constitutes the baseline for an object-oriented classification in which those regions are the basic units. The conceptual framework underlying such partition, and a general automated method for achieving it, are also given. The former is formalized into a three-tiered model of the territory, in which the last tier is the landcover map itself. The latter is based upon morphological segmentation and uses three consecutive techniques, namely adaptive diffusion filtering, gradient watersheds and region merging.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
"A map is not the territory it represents, but if correct, it has a similar structure to the territory, which accounts for its usefulness" Alford Korzybski, Science and Sanity (1933)
1.1. Introduction The UN Conference on the Human Environment, held at Stockholm in 1972, was a first milestone in the process of raising public awareness on environmental issues, a process that reached its maturity after the Rio Conference in 1992. As a result, most governments around the world now have ministries and regional agencies devoted to environmental protection. To pursue this goal, these institutions need reliable information on relevant themes of the territory under their control. Consequently, the demand for geo-referenced environmental information has grown considerably in the last decades, at a pace akin to the development of the technologies that enable it. Based on this information, the authorities make decisions, and the public make up an opinion that in turn influences those decisions. Thus comprehensive up-to-date geographic information on the environment is a must for a wealthy society concerned with environmental quality. This need is even more conspicuous in countries like Spain, where every year many infrastructures are built, there are land use changes from rural to urban and from agriculture to forest, and thousands of square kilometres are burnt by wild fires. From all the layers of information involved, landcover1 is perhaps the most important, since actions and regulations in a region are prescribed according to its nature and condition. Landcover influences markedly the productivity, vulnerability and biodiversity of ecosystems, and has a crucial impact on biogeochemical cycles, albedo, and ultimately global climate. Thus information on landcover is essential for a proper management, monitoring and preservation of our environment. This information has been traditionally presented in the form of hardcopy maps, and more recently, as digital layers integrated in a geographic information system (GIS). Within GIS, not only the information can be visualised at any scale, but combined with other themes and linked to databases for quantitative analysis.
Generally speaking, landcover is the biophysical cover of the Earth solid surface. Landcover mapping units refer to both natural (and seminatural) vegetation types as well as other landcover types as e.g. agricultural fields, forest plantations and, of course, non-vegetated types like urban areas and lakes (Millington & Alexander 2000).
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
In the last decades, as a result of progress in Computing Science, Remote Sensing (RS) and GIS, there has been a trend towards automated mapping. Nevertheless, the overall accuracy of the maps produced in this way is normally below the user’s requirements. Consequently most landcover mapping projects still rely to some extent on photointerpretation. But this procedure is not free of problems. Apart from being less cost-effective, the manual drawing of polygons involves a great deal of subjectiveness. Therefore conclusions about changes in landcover between consecutive updates made in this way are basically unreliable. As the focus is increasingly shifting to monitoring rather than simple map making, there is a need to improve quantification, modelling, and ultimately prediction (Green & Hartley 2000c). But once again, we are confronted to the low accuracy of current methods. The aim of this thesis is to contribute to an ongoing effort aimed at solving this dilemma by integrating into the analysis the principles of object-oriented modelling. Object-orientation is the use of objects and classes in analysis, design, and programming. In particular, Object Oriented Analysis (OOA) seeks to model the world by identifying the classes and objects that form the vocabulary of the problem domain (Booch 1991). The object-oriented approach is especially useful for representing and interpreting the enduring structures of domains, integrating relevant physical entities into a coherent relational framework. This thesis deals with the linkage of three domains of decreasing complexity: landscape, RS images and landcover maps. Hence it crosses three disciplines: landscape ecology, RS digital image analysis and geographic information science. The goal is to construct an object-oriented path from patches to polygons by means of image segments, for which the first part is paved with an automated method. The traditional approach to the analysis of satellite images for landcover mapping, which is mostly based on the classification of signatures (multivariate samples drawn from pixels), leads to unsatisfactory results mainly because it does not make use of the spatial information embedded in the images. In addition, this approach conceives the territory as made of distinct homogeneous surface materials, and hence cannot account for the pervading hierarchical heterogeneity of landscapes. Furthermore, the approach is in many respects incompatible with the hierarchical patch model underlying modern landscape ecology. The latter conceives classes as referring to types of geographic objects (patches) of a particular scale. These objects are complex compounds of sessile objects like trees and buildings, and can be viewed 3
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ as nested integrated wholes different from their surroundings in one way or another. In being complex, they are inherently heterogeneous. In contrast, for the traditional approach, any arbitrarily delimited part of a classified region is still a referent of the concept conveyed by the class, provided it is large enough as to include the elements producing a typical signature of that class. The alternative approach that will be followed here is to derive, based upon the spatial structure of the image, a fine partition of the scene, in which each region exceeds the minimum size required for the geographic objects of interest. This partition constitutes the baseline for an object-oriented classification in which those regions are the basic units. The term object is used here to refer to a distinct region in the image that corresponds to a patch on the ground, and that can be viewed as an instance of some class (i.e. as a referent of a geographic concept like e.g. ‘pine forest’). The attributes taken into account in the classification include many spatial (e.g. size, shape and location of the regions) and contextual (relations between neighbouring objects, subobjects and superobjects) aspects that are not considered in the traditional approach. The conceptual framework underlying such partition, and a general automated method for achieving it, are also given.
1.2. Thesis overview The remaining of this chapter deals with the implications of mapping landcover with satellite imagery. First, it is concisely explained what landcover maps are, how they are made, and what is the view of reality underlying such maps. Then the importance of scale is stressed and linked to the hierarchical structure of the landscape. Later on, the role of earth observing satellites in landcover mapping is set forth, and the information provided by the images they acquire is thoroughly discussed, including the influence in it of (not only spatial) resolution. This part is concluded with a detailed account about information on floristic composition. In the last part of chapter 1, the analysis of satellite images for mapping purposes is addressed. It is suggested that the existing methods can be reduced to two main approaches, according to the order in which the identification of objects takes place: object-based (where the boundaries of the objects –patches- are defined prior to the determination of their content) and pixel-based (where the objects are retrieved by connecting after classification pixels 4
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ equally labelled). The treatment of the first approach, except for a short comment on photointerpretation, is left for subsequent chapters, whereas the second is briefly described and then criticised in depth. Next, the path already followed towards object orientation is briefly studied. At last, the main points are summarized through an historic account on how we came to use the data this way, and on why this way is not longer suitable. Finally, the objectives and main contributions of the thesis are listed. Chapter 2 is devoted to the construction of a conceptual framework for grounding objectoriented analysis of satellite images for landcover mapping. A short introduction gives way to a discussion on cognitive categorisation, in particular, the concept of partonomies (part-whole hierarchies) is introduced and linked to taxonomies through the theory of granular partitions (Smith & Brogaard 2000b). Granular partitions are ways of structuring reality (by dividing it up into meaningful chunks) in order to make it more easily graspable. Landcover maps involve the construction of two reciprocally dependent granular partitions: one over the attribute domain and the other over the territory. Later on, the geographic objects created by the mapping activity are precisely defined and their ontological status explored through the study of the boundaries that enclose them. The vagueness inherent to these boundaries is tackled with the aid of supervaluationist precisifications on their possible location, that are condensed into a probabilistic epsilon band that in turn is approximated with the aid of a raster partition. In addition, the notions of geographic field (a regionalized variable) and measurement disk (the areal template over which class attributes are measured) are introduced. In successive sections, the practical drawing of boundaries is addressed with an example on a hypothetical forest map. Throughout the example additional issues are discussed like the admissible size of disks, the dependence of spatial homogeneity on the observer, the existence of gaps inside objects, the need for a minimum mapping unit (MMU) and for mosaic (semantically heterogeneous) objects, and the unavoidable appearance of the modifiable areal unit problem (MAUP). This part is ended by introducing some fundamental concepts of geographic modelling, and most importantly, the topological theory of attractors (Thom 1975) is presented as the general tool with which to demarcate physical objects, in particular the minimal objects compounding an image.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ Having set forth a comprehensive set of basic concepts for geographic modelling, an existing multi-tiered framework for spatio-temporal databases (Frank 2002) is adapted to construct a general landscape model for landcover mapping. The model attempts to i) justify the validity of the use of RS images to produce spatial information on landcover, and ii) formalize the conceptual foundations previously stated into a model. The model provides a framework that states explicitly how the objects created by the analysis relate to the underlying real world. It constitutes the basis of a classification method that is oriented to the construction of geographic objects from its very initial steps. Since the properties measured by remote sensors relate only indirectly to the biophysical properties used to classify landcover, the model is presented in two versions, the idealistic or I-model, and the realistic or R-model. In the I-model there are unlimited resources for the measurement of landcover properties, while in the R-model the resources are finite. In both versions tier-0 is the commonsensical reality itself, which become measured in the next –field- tier. From the measurements, a set of continuous (I-model) or discrete (R-model) fields are created. The former represent biophysical landcover attributes, whereas the latter are constituted by satellite images. In the case of the I-model, the next and last tier (classified objects) can be directly derived from the previous one by either attribute thresholding or morphological segmentation followed by aggregation into mappable zones. In contrast, in the R-model there are severe limitations, since it is unfeasible (due to economical reasons) to measure extensively the biophysical properties intended for classification. In this case, the construction of the last tier is not trivial, since the relationship between the images and the actual values of the properties of interest is uncertain. The hypothesis linking the R-model to the I-model and the former to reality are set forth and thoroughly discussed, and a general method to obtain the last tier of the R-model is proposed. Finally, the inadequacy of the traditional approach to image classification is again addressed in the light of the new concepts introduced in this chapter.
Chapter 3 introduces an automated method that creates the starting scenario, or baseline partition, for object-oriented classification in the framework of the R-model. The method transforms the input RS image (either single or multiband) into a vector layer consisting of basic mappable zones, termed granules. The method is tentatively implemented as follows. The first step consists in applying a new non-linear diffusion filter that leads to a piecewise constant image where textural features are suppressed. A gradient magnitude image is 6
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ subsequently computed from the filtered image. This image is then used as the input to the watershed algorithm, whose output is a primal sketch of the image with the contour of blobs (small homogeneous regions, darker or brighter –or with a different hue- than their surroundings). These blobs constitute the initial regions of a novel region-merging algorithm that aggregates them until all the regions in the partition exceed the MMU size. The dissimilarity measure used to compare regions takes simultaneously into account both luminance and chromatic contrasts. Finally, the resulting partition is vectorized, and optionally some granules attributes may be compiled in the associated database. The chapter is completed with some examples of the results achieved in real images, which are graphically displayed and discussed. Finally, in chapter 4, the main conclusions of this thesis are summarised, and future work and open research questions are listed.
1.3. Landcover mapping: from paper to GIS Landcover information is obtained through an inventory via which data is collected and recorded. For many years, the spatial database resulting from the inventory was a drawing on a piece of paper, the landcover map. Using a topographic map as background, various symbols, colours and text codes together with legends were used to display polygons representing semantically homogeneous patches of each landcover type that occur in the geographical area of the map. Other additional information was given in accompanying narratives. The scale was dependent on the scope of the map and the cartographic base available, typically ranging form 1:25,000 to 1:250,000. In the last few decades, these landcover polygons were delineated manually by interpretation of aerial photography (photointerpretation) and then portrayed on a map using standard cartographic methods. The interpretation process consists in the identification of homogeneous regions in the image, and it is based in the visual differences that different landcover types create. The major features used for this task are texture, tonal contrast or colour, pattern, size, shape, and context. Taken together, these features make up a diagnosis that allows landcover to be mapped without having to visit every polygon on the ground. Nevertheless, there is always a need for field verification in order to assess the map accuracy, i.e. the frequency of labelling and/or delineation errors in the map. 7
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
The overall process of landcover mapping in the conventional way described above is too costly and time consuming as to allow timely updating. It requires a considerable amount of specialized manpower and possibly an ad hoc photogrammetric flight. The interval between consecutive updates for this kind of maps is typically greater than ten years, which is clearly insufficient for most applications, e.g. monitoring deforestation or urban development. However, technical advances in the last three decades have completely changed the picture. First, computing technology has lowered dramatically the cost of data storage and increased amazingly the processing speed. Second, telecommunications now enable a massive worldwide instantaneous flux of information. An third, Remote Sensing technology has increased steadily the spatial, spectral and temporal resolution of data acquired from satellites. Nowadays landcover maps are no longer visual spatial databases printed in a paper sheet. They are integrated in computer-based GIS, which manage large geographic databases structured in thematic layers that can be combined easily to match specific user needs. GIS enable the interaction between statistical analysis and mapping. The expanding GIS software market has also made easier and cheaper to access and use land inventories. Accordingly the demand on this kind of information is growing exponentially.
1.4. Landscape models, landcover maps and the reality behind Landcover variation within a landscape (a portion of solid Earth surface on the order of 1 to 1000s of km?) is driven by natural and/or anthropic processes, and can be modelled as continuous (gradients) or discrete (mosaics). Thematic maps represent surface variation in a highly generalised, selective and abstracted form (Ehlers, Edwars, & Bedard 1989), i.e. they describe a model of the territory rather than reality. The landscape model underlying most landcover maps is a discrete one, the piecewise homogeneous model. In this model, the landscape consists of a mosaic of contiguous geographic objects of irregular shape and size that cover the plane exhaustively. Each object is considered internally homogeneous at the level of abstraction of the map legend, i.e. the corresponding real surface on Earth has only one meaning in the language of the map, and this meaning is not the same than the one of adjacent objects. The homogeneity is more conceptual than real (this issue is discussed further
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ in the next chapter). Each object is represented in the map by a polygon, or basic mapping unit. These units are identified according to the similarity of a set of attributes or properties, as floristic composition, physiognomy (plant cover and size distribution), physiographic characteristics (type of soil, slope, aspect), disturbance history, etc. The number of i) attributes taken into account, ii) categories within each attribute, and iii) possible combinations of categories between attributes, constitutes the classification scheme of the map, i.e. the phrase book (available polygon labels) and the grammar (rules to form labels) of its language. It is selected according to the focus (e.g. land management or planning, forestry, hydrology, cadastre, biodiversity conservation, etc) and scope (local, regional or national) of the map. Classification schemes can usually be nested in a hierarchy of increasing level of abstraction, from communities to biomes. At the same time, the landscape can be perceived as a spatially nested patch hierarchy, where patches at each level may consist of smaller patches (Wu & Loucks 1995). Such view constitutes a new paradigm in Landscape Ecology (the Hierarchical Patch Dynamics of Wu and Loucks (Wu & Loucks 1995)), that goes a step beyond with respect to the piecewise homogeneous model by acknowledging that pattern, process and scale are inseparable. The hierarchical patch model underlying such paradigm is the one that will be followed in this thesis (see appendix 1 for more information on the implications of the hierarchical structure of the landscape). Note that the plain term patch refers to a spatial unit differing from its surroundings in nature or appearance (Wiens 1976), with no implications to its scale. That is to say that generally speaking, a patch can be from an isolated tree to an island continent. Within the patch hierarchy, one has to choose a level beneath which finer patchiness can be neglected. In the context of this thesis, that basic level consists of patches defined as a contiguous area of similar dominant species composition and percentage of cover, occurring in an area of similar physiographic characteristics (soil, slope, aspect). Note that this definition implies that the surroundings of the patch show a different physiognomy and/or physiography, hence it conforms to the above general definition of patch. This unit represents the lowest level of abstraction considered (see 1.9 for a justification), and to avoid confusion will be hereafter
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ referred to as landcover patch, to distinguish it from polygons, which in most maps will consist of an aggregation of these basic units. Note also that there exists a direct relationship between the level of abstraction of the map, its degree of spatial generalisation, and the mean size of the polygons, since hierarchical partitions of the attribute domain create potentially hierarchical partitions of the spatial domain (Bittner & Smith 2001a). This can be better explained as follows. The world can be conceived as organised in hierarchies, so that components at different levels differ in size roughly by orders of magnitude (Salthe 1985). Since each level of abstraction is focused on a particular level of the physical hierarchy, the more we generalise the bigger the objects of interest. Understanding the what, the why and the how of the hierarchical organisation of the biosphere facilitates realizing the implications of scale in landcover mapping. In order to help myself in this task, I wrote a note on the issue that can be found in appendix 1. Although I think it is worth reading it now, those not interested in such questions can safely skip it, albeit some concepts introduced in it will be used later in the thesis.
1.5. The issue of scale An important consequence of the pervading hierarchical structure found in nature is that knowledge on the real world is also layered. To put an example, a plant pathologist does not need to know the configuration of molecules inside the cell of an abnormal tissue to diagnose a disease, and neither need a forester plant physiology to estimate the height growth of a tree, and neither a meteorologist forestry to forecast regional weather. Since knowledge on the real world is acquired through observation, the consequence is that each layer of knowledge has its own appropriate scale of observation. The latter is normally done through an instrument, as a microscope or a camera.
The range of scales at which the observation is useful depends on the size of the objects
of interest of each particular layer of knowledge. Given a particular object like e.g. a tree, the
upper bound of this range is the largest scale at which the object is still recognizable, while the lower bound is the minimum size of the field of view that completely encompasses the object. When there is no a priori information about the size of the objects, or when the variability in size of the objects is too high, it is not trivial to establish the appropriate scale of observation (Lindeberg 1994). 10
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
This concept is closely related to John Wiens’ (1989a) domain of scale: a region of the scale spectrum within which patterns and their relationships with underlying processes either do not change or change monotonically with changes in scale. Each domain corresponds to the range of scales at which the objects nested in a particular level of the hierarchy are visible. The transition, or scale threshold, from one domain (e.g. the set of scales at which individual trees are still observable) to an adjacent domain (e.g. where the trees are no longer visible and the forest become apparent as an object of its own), may be relatively abrupt and characterized by complex non-linearities arising from dissolvence/emergence processes (appendix 1), much like phase transitions in physical systems. Thus relationships between variables may not be easily predictable between domains, and this may cause erroneous inferences like the individualistic fallacy (to derive properties of the aggregates from the ones of the individual components) or the ecological fallacy (the other way round) (Alker 1969). It is important to note that scale thresholds do not occur simultaneously for all the objects compounding a given level of the landscape hierarchy. This is due to the fact that these objects are generally sufficiently variable in size that the different levels in the hierarchy overlap in size (Woodcock & Harvard V.J 1992). This view is supported by the relatively gentle slope of the scale variance graphs that Townshend and Justice (Townshend & Justice 1988) found when they tried to show that there is still information at virtually any spatial resolution as imagery is degraded to coarser scales. The consequence is that there is no single optimal scale to study landscape at a given level of abstraction. The scale range of observation depends on the instrument and is bounded on two sides: the inner scale is the smallest detail seen by the smallest aperture (a CCD element in SPOT HRVIR or a cone or rod in the human eye); the outer scale is the coarsest detail that can be discriminated, i.e. the whole image, scene or field of view (Haar Romeny 1997). The outer scale can be seen as well as the geographic extent of the study area, since several images can be mosaicked to cover the whole area if the latter is larger than a single image. The inner scale is equivalent to spatial resolution in Remote Sensing literature, and is closely related to the ground-projected instantaneous field of view (GIFOV) of the sensor. The pixel size, i.e. the ground sampling interval (GSI) or ground area corresponding to each data element in Remote Sensing standard products, is chosen to match the GIFOV, although the point spread function (PSF, related to the variable sensitivity of the detector within the area of integration) of the 11
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ sensor usually yields an effective GIFOV coarser than the geometric one (Schowengerdt 1997). Nevertheless, within this thesis, pixel size has the same meaning than spatial resolution, although this does not necessarily mean that it coincides with the size of the smallest ground object that can be reliably detected (Davis & Simonett 1991). The classic cartographic scale, i.e. the constant ratio between distance between pair of points on the map and distance between the same pair of points on the Earth surface (Goodchild & Quatrocci 1997), is chosen so that the objects of interest can be clearly seen at glance on the map, and the smallest size of them (the minimum mapping unit, MMU) still seen by the naked eye. Thus the extent and level of detail of a map are somehow related to respectively the field of view and maximum resolution of the human eye, so that the density of visual information portrayed in maps of different scale is roughly the same (Frank & Timpf 1994). To end up this account by connecting it to the previous section, it can be said that the scale of the map is determined by the layer(s) of knowledge about which the map is aimed to inform. Each scale corresponds to a certain level of abstraction of the Earth surface. Therefore each level of abstraction has a direct correspondence to the size of objects of interest, landcover patches in our case. Now we turn to the modern means that provide information about them.
1.6. The role of Earth Observation Satellites in landcover mapping Remote Sensing1 (RS), as understood in this thesis, is the measurement of some electromagnetic properties of the Earth’s surface (land and oceans) through sensors mounted on aircraft or satellites. Satellite Remote Sensing is generically known as Earth Observation (EO), and the measurements supplied –usually in the format of digital images-, EO data. Land EO began with the launch by NASA of the ERTS-1 (Earth Resources Technology Satellite, later renamed Landsat-1) in 1972. The Multi Spectral Scanner (MSS, four bands) on board this satellite provided for the first time a consistent set of synoptic (185 km swath2), high resolution (80 m) images to the scientific community. Since then, the number and capabilities of EO satellites have grown steadily. Currently there are more than 30 civilian satellites providing data of 30 meters or better resolution (Stoney 1997).
The term "remote sensing" was coined by Dr. Evelyn Pruitt in the 1940's when she was with the U.S. Office of Naval Research. The term generally implies that the sensor is placed at some considerable distance from the sensed target, in contrast to close-in measurements (Short 2002). 2 The swath is the width of the strip of the Earth’s surface imaged in each orbit.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ The present EO satellites provide an alternative source of information on landcover more cost-effective than aerial photography. First, EO data are cheaper. For example, it takes some 750 1:50,000 aerial photos to cover a 185x185 km? Landsat 7 scene that costs only 700 euro (2 cents per km?) and has a slightly inferior level of ground detail (15m resolution in panchromatic band). Even 1m resolution satellites as Ikonos costing 20 USD/ km? are still cheaper if the task requires a new flight. Another benefit of EO data is their digital format. Spatially the data form up a matrix or raster1 composed of cells, each one having a digital number (DN) which may be related, normally after some correction or calibration, to the biogeophysical characteristics of the piece of land to which that cell corresponds. The result of displaying (or printing) this data is a grey-level image or band. This is done by relating each raster cell to a picture element (pixel) of a screen through a Look Up Table (LUT), which assigns a discrete brightness level to each DN. This association is the reason why the term pixel is equated to data element in EO data. The LUT entries are usually chosen as to enhance the image contrast. Since colour monitors have three guns (Red, Green and Blue), it is possible to display three bands at a time, forming a RGB colour composite (Pinilla 1995). Multispectral images typically consist of several bands, each one corresponding to a particular interval of the electromagnetic spectrum to which the respective detector is sensitive. The location of these intervals in the spectrum is selected as to produce a good discrimination between different ground features. This digital nature enables a number of quantitative computer-assisted analysis that yield objective estimates about landcover and other features of the Earth surface that can subsequently be integrated into a GIS. Finally, another important advantage of EO data is the readiness with which they can be purchased. There are several private companies that distribute in a commercial basis data from satellites operated by themselves or by space agencies, and thus relieve users of all data acquisition problems. Most of these satellites fly almost along meridians in a sun synchronous Low Earth Orbit (LEO, ranging from 250 to 800 km altitude), permitting repeat intervals from 3 to 60 days depending on the swath of the image. This means that for most places of the
Altough strictly speaking, the term ‘raster’ refers to a scanning pattern of parallel lines that form the display of an image projected on a cathode-ray tube of a screen, it is widely used instead of grid (a pattern of regularly spaced horizontal and vertical lines forming squares on a map, a chart, an aerial photograph, or an optical device, used as a reference for locating points).
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ world, it is possible to purchase from 1 to more than 20 cloud-free optic high resolution images a year, depending on the cloudiness at that site. The latter constraint does not hold for Synthetic Aperture Radar (SAR) images, although the information on landcover that can be brought forward through their analysis is more limited. In short, the multitemporal and multispectral capabilities of EO satellites and the inexpensiveness of the data they provide make them superior to conventional aerial photography as a source of up-to-date landcover information, even more when there are already satellites providing resolutions equivalent to the detail of the photos. The problem now is how to derive this information.
1.7. The information content on landcover of EO data Information comes ultimately from physical order. Order implies differences, that is, nonuniform distributions of matter/energy. Certain systems having sensors can take advantage of these differences for cognitive purposes, provided they can detect them. From all the set of detectable differences (latent information), only the relevant ones are used by those systems to construct a handy representation (structural information) of the sensed scene, that has to be formalised into a model in order to be communicated. If the system has cybernetic – communication and control- capabilities, and the model is taken into account in its decisionmaking process, then it becomes functional information. The meaning of that information is the prediction of the success of the selected action, and its value depends on the consequences of that action. If the result of the action is the one expected, then this information is stored as (reusable) knowledge. These are the conclusions I have drawn from a limited inquiry I have made on the concept of information. I felt it was apposite to elaborate on it, given the lack of a clear definition in the literature. Interested readers can find the full text in appendix 2. Another conclusion is that satellite data have no fixed information content, it rather depends on the task at hand. In any case, it is the structure of the image, formed by luminance and chromatic changes taking place across it, the only bearer of (latent) information. From all (either spatial or spectral) patterns that can be observed in an image, only those coming from objects that are to be detected and modelled are relevant. Thus the goal of image analysis is to make explicit these patterns by drawing (either spatial or spectral) boundaries separating the objects of interest. Later it is 14
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ assumed that the objects created by the analysis have a fixed correspondence (given a fixed set of viewing, illumination and atmospheric conditions) with the objects in the real world that gave raise to the (spatial and spectral) structure of the image. Before going on, I would like to make clear the concept of the spectral structure of a multiband image, since it is not as intuitive as the spatial one. Such structure is derived from signatures usually drawn from individual pixels. A signature is an n-component vector where each component or dimension corresponds to a band, and the value at each component is the DN of the pixel in the corresponding band. Such signature can be plotted as a dotted curve (that obviously can also be interpolated to draw a continuous curve) where each dot is the value in one of the consecutive spectral bands of the image (see figure 1-3 in page 21). Since this is the form usually adopted by the measurements of desktop spectrometers, hence the name of signature. This vector can be also viewed as the coordinates of a data point in an n-dimensional stochastic space1 usually referred to in the Remote Sensing literature as the feature space, which here will be termed the data space. The non-uniform arrangement of signatures (which usually tend to cluster into more or less discontinuous regions) within this space forms a structure that is seized in the analysis. Albeit the adjective spectral is more restrictive than feature, the latter is more ambiguous (it is also used for ground objects), therefore the former will be preferred in this thesis when talking about the structure of the data space. This does not mean that the discourse on multi-band images holds only for optic ones; rather, it could be applied to any multi-band image, no matter the nature of the measurement. Finally, note that instead of individual pixels, signatures can also be drawn from groups of connected pixels. For this purpose, the image is previously partitioned into homogeneous regions and the mean value of the pixels inside each region is used as a signature. In this case, the data space is less densely populated than when each signature consists of a single pixel. Last but not least, it is important to note that, following the discussion in appendix 2, information is not extracted but produced during the analysis. The result of the analysis is the imposition of a simpler, meaningful structure upon the intricate original one, that is, a formal representation (a model) of the structure of the image. This model is dependent not
Where the location of the data point can be interpreted as the result of an stochastic process. A stochastic process is an a priori indeterminate process that is characterized by a distribution. If there are many and independent elemental effects, the distribution of the process is a Gaussian (Koch 1988).
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ only on the data alone, but also on the definition of the type of objects we want to foreground and on how detailed the representation should be. Different choices during the process of analysis will yield different representations (and hence different information) from the same data set. In order to emphasize the former assertion, note that the etymological origin of informare (‘to give form’, in Latin) implies the realisation of structure upon some material. Thus the image can be seen as the raw material upon which the model is carved out. The final form of the representation is dependent not only on the material but
Figure 1-1. Duck or hare?
also on the analyst. However this creative freedom is by no
means unrestricted. In the example of figure 1-1, if the task is to identify an animal you could argue whether it is a duck or a hare, but if you see a cat, you would probably be invited to visit a psychiatrist. Hence structural information is subjective, but it has also a structure similar to the one of the piece of reality it refers to. The latter property is the hardest constraint to the output of a sound image analysis, and indeed is this isomorphism1 what accounts for the usefulness of the result (recall the opening quotation of this chapter). So now, what is the information content of EO data on landcover? Assuming the piecewise model of 1.4, and having set a certain level of generalisation according to our objectives, the objects of interest for the study of landcover are a set of geographic objects to be represented in the map by polygons. The goal is to identify these objects within the area of the map and get answers about the size, shape, location and inherent properties of each one of them. The properties to be studied are those considered in the classification scheme (vegetation physiognomy, floristic composition, and so on). The information content of a data set can only be defined in relation to this scheme. Intuitively it would be the fraction of questions identifying the objects, as defined by the scheme, that can be answered reliably through the analysis of these data. Since there is a surrogate relationship between the defining properties of landcover and the ones measured by remote sensors, the analyst may answer some of these questions using EO data. Although the mutual implications of this relationship will be
Strictly, a term in mathematics for an exact correspondence between both the elements of two sets and the relations defined by operations on these elements. Obviously here the correspondence refers only to the elements of the territory that are represented in the image, otherwise we would be saying that RS images are territories!
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ addressed with further detail in the next chapter, some considerations on its nature must be done now. Satellite sensors measure Earth surface electromagnetic properties, such as the reflectance (indirectly, see footnote on next paragraph) of solar radiation in different intervals of the spectrum, in optic sensors, or the fraction of a power pulse emitted by a SAR instrument that is scattered by the surface back to the radar antenna (backscatter), in active microwave sensors1. The analysis of these measurements can bring forward information on our objects of interest, since the electromagnetic behaviour of a landcover patch is dependent on a certain number of characteristics that can be linked directly, through allometric equations2 or by other means to the properties considered in the map, as we shall see in the next paragraphs. In the case of optic sensors, canopy apparent radiance3 as observed from a satellite is a function of: i) structural properties such as leaf area index (LAI, half the total green leaf area in the plant canopy per surface unit), leaf size, shape and inclination, horizontal and vertical distribution; ii) optical properties (reflection, absorption and transmission from, within and through tissues, which depend on their biochemical composition and thickness) of leaves, other vegetation elements (bark and flowers) and soil; iii) incidence and viewing angle; and iv) atmospheric conditions (Asrar 1989). In the case of SAR instruments, radar backscattering from vegetated areas depends on three groups of factors: i) target structural attributes (soil roughness and canopy architecture, that is, size, shape, distribution and orientation of scatterers: leaves, branches and trunks); ii) target dielectric properties, mainly controlled by the moisture content of soil and canopy; and iii) wavelength, polarisation and look angle used by the system (Ulaby & Dobson 1989). The wavelength λ of the emitted pulse determines the penetration depth of the waves in the imaged target. Thus the scattterers for X-band (λ=3cm) will consist of the first leaves on top
Readers not acquainted with radar imagery and willing to get some basic insight may refer to “Principles of radar imagery”, FAO Remote Sensing series No 62, 1989. 2 Equations commonly used in Biometry, in which one part or characteristic of a living entity (e.g. the Leaf Area Index (LAI) of a forest) can be expressed as a power function of another part or characteristic (the Basal Area (BA) of that forest). 3 The apparent radiance is the amount of power density coming from the sun scattered by both the canopy and the atmosphere (path radiance) in the direction of the sensor. Radiance has units of Watts per square meter per steradian, whereas reflectance, an inherent property of surface materials describing what proportion of the incident energy is reflected, is dimensionless (Richards 1993). In principle, it is possible to derive estimations of reflectance from radiance measurements. NASA is currently testing a new instrument on board EO-1, the Atmospheric Corrector (AC), that will provide accurate correction for atmospheric variability (primarily water vapor and aerosols), enabling accurate reflectance measurements for land imaging missions.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ of trees, whereas for L-band (λ=23 cm) they will be thick branches and trunks. Apart from this, volume scattering tends to depolarize the pulse, so that polarization may provide information on the shape and orientation of the scatterers (FAO 1989). These features will be exploited by the European TerraSAR mission (http://www.infoterraglobal.com/terrasar.html), to be launched in 2005. TerraSAR will make near-simultaneous observations of the Earth in high spatial resolution up to 1 m in X-band and with full polarimetry and high radiometric resolution in L-band. Apart from backscatter intensity images, coherence images, derived from pairs of SAR complex (phase and magnitude) images acquired over the same scene at different times (therefore from slightly different orbits), deserve a comment. Coherence is a statistic for describing the quality of interferogrammes (that record the differences in phase between the two images), which may be used for landcover mapping complementarily to backscattering intensity. It can be interpreted as the fraction of power scattered by unchanged parts of the scene. Coherence over a certain area is determined by phase decorrelation within an averaging window, often caused by random displacements (due to wind) of the contributing scatterers, or by a random thickness change of an intervening dielectric medium (due to e.g. rain or snow). Therefore information in coherence images comes from the differential behaviour of each landcover type, in terms of relative permitivity and geometry of the scatterers, when exposed to different wind and moisture conditions (Hobbs, Ang, & Seynat 1998). Finally, it is important to mention that a spaceborne LIDAR (Light Detection and Ranging, that is, an improved laser altimeter), will be available shortly. The Vegetation Canopy LIDAR Project (VCL), the first NASA Earth
Figure 1-2. Lidar waveform diagram (Image by Robert Simmon, NASA Earth Observatory)
(http://essp.gsfc.nasa.gov/vcl/), due to launch on 2000 but delayed because of financial problems,
is designed to provide a global database of forest vertical structure (layering) and tree height. Lidar sensors measure elevation by bouncing laser light off of a surface and measuring the time the light pulse takes to return. By also recording the shape of the "waveform" of the returned signal, the location —in the space between the ground and the tree tops- where the foliage, trunk, and branches are concentrated can also be estimated (figure1-2). 18
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
Coming back to the point, the relative contribution of landcover properties to the actual remote measurement together with the stability of that contribution define the strength of the surrogate relationship. If the contribution is high and is always the same, the conclusions drawn from data regarding the property of interest are likely to be right. In the other extreme, if the contribution is low and/or changes randomly from one location or time to another, there will be a lot of uncertainty about the conclusions drawn, if there are any at all. For most instances, the situation is an intermediate one. A good example would be the normalised difference between the near infrared band and the red band of an optic sensor over a timber forest. This parameter, known as the Normalised Difference Vegetation Index (NDVI), is highly correlated to LAI, which in turn can be used to estimate basal area, and the latter together with the mean tree height are the main variables used to estimate timber growing stock, that could be a key element to map the forest into stands for management purposes. However this correlation is not stable, it tends to decrease when LAI is high due to the early saturation of the red band (Peterson & Running 1989). In addition, the more factors external to the map classification scheme are involved in the sensor response (atmospheric and illumination conditions, sensor gain, variable extrinsic factors affecting land objects, as snow or drought), the higher the uncertainty of the information produced with the data. The reliability of that information decreases with the magnitude of these external factors. An extreme example of this would be an optic image of a completely cloudy scene, in which there is no information at all about ground objects. Finally, there are also factors related to the resolution of the measurement, which will be discussed in the following section. To sum up, the information content of an image can only be defined in relation to a specific task with definite objects of interest. Furthermore, it can only be measured after having performed the analysis, since it is in this process where the information is actually produced. Apart from the analysis itself, this ex post content relies on the strength of the relationship between the properties measured by the remote sensor and the ones defining the objects of interest. Hence it is dependent on the relative contribution in the sensor response of i) landcover properties taken into account in the map classification scheme; and ii) properties
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ external to that scheme. The reliability of the information derived from the data is directly proportional to the weight of i) and inversely to the one of ii).
1.8. The impact of resolution on the information content Not only the nature of the sensor measurement affects the information that can be derived from it, but even more markedly, the level of detail or resolution of that measurement will determine its potential to bring forth useful information on the Earth surface through analysis. Resolution has four dimensions: radiometric, spectral, spatial and temporal, that will be discussed in the next paragraphs. Radiometric resolution is the number of bins into which the continuous range of possible sensor responses is quantized. The photons reaching the detector during the exposure time (in the order of milliseconds) at each sampling position (pixel) are converted to an electrical signal and then quantized to a discrete number that is expressed in binary digits (bits). As with all digital data, a finite number of bits (BD, the bit depth) is used to code the measurement, so that the DN can be any integer in the range [0, 2BD – 1] (Schowengerdt 1997). This dynamic range varies from the 8 bits [0,255] of Landsat and SPOT products to the 16 bits [0,65535] of Envisat ASAR PRI (precision radar image). From the four aspects of resolution, the radiometric one is perhaps the less important with regards to information. Visually, it is virtually impossible to distinguish e.g. a 5 bit (32 grey level) rescaled image from the 16 bit original one. From the quantitative analysis perspective, (Narayanan, Sankaravadivelu, & Reichenbach 2000) showed that the results of classification over a TM scene did not change significantly by reducing the bit depth from 8 to 4 bits, suggesting that a great deal of memory can be saved even before applying any compression on the data set. The reason behind such an apparent waste is that the sensor must be able to measure very different situations without reaching saturation and without losing discrimination power over a relatively homogeneous scene. Spectral resolution applies only to optical (and thermal) remote sensing. Multispectral (several bands) and hyperspectral (tens of bands) sensors split the reflected solar light into multiple paths with the aid of prisms, and have different spectral filters in each path, so that 20
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ several to tens (even hundreds) of bands can be created, each one displaying the amount of energy received in a particular interval, or band, of the electromagnetic spectrum. Remote Sensing spectrometry relies in the fact that reflectance values for each material vary according to the wavelength distribution of the incident radiation. Since this variation is unique for each material, the response of the sensor in different bands can be used to identify the material from which the imaged surface is made. Therefore, the greater the number of bands and the narrower their width, the higher the spectral resolution and the more accurate the identification can be. Spaceborne hyperspectral imagers are already available. The Hyperion instrument, on board NASA EO-1, provides 220 contiguous spectral bands (from 0.4 to 2.5 ?m, each band of 10 nm width, see figure 1-3) with a 30-meter pixel size. From the hyperspectral data, it is possible to construct a spectral curve that i) can be matched with spectral signatures of pure materials (endmembers, that is, the centroids of spectral
Figure 1-3. Hyperion’s 220 bands (hashed line) provide a more accurate depiction than the broad bands of Landsat (dots) . (Graph by Robert Simmon, NASA Earth Observatory)
classes) available in spectral libraries; or ii) can be searched for specific reflectance peaks and absorption troughs indicating the presence of
special materials; or iii) can be analysed with some spectral unmixing technique if the spectral curve is suspect of being a mixture of several materials. Spatial resolution can be equated to the pixel size, i.e. the ground sampling interval (GSI, see 1.5). Geographic space variation is regularized through the convolution of the signal coming from individual ground objects within this sampling field (Jupp, Strahler, & Woodcock C. 1989). In other words, the measurement at each pixel is an averaging (weighted by the PSF) of the radiance incoming from the ground area corresponding to that pixel. If the objects of interest are smaller than the pixel size, it is virtually impossible to derive any conclusion about their shape and size as individual entities (e.g. the crown size of individual trees from a 30 m resolution Landsat image, or the boundaries of a crop paddock from a 1.1 km resolution AVHRR image). If pixels are in the same size range of objects, the correlation between object properties and sensor response will be maximal for pixels fully comprised within objects, but 21
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ in turn there will be a high proportion of pixels including several objects with different properties where the mixture will degrade the correlation (Woodcock & Strahler 1987). Finally, if the objects of interest exceed by far the pixel size, the parts (e.g. trees) composing the object (landcover patch) become apparent, and the measurements of individual pixels can be related more readily to the properties of the parts than to the ones of the whole, complicating the analysis of the data. In turn, precise estimates of the shape and location of the objects of interest can be drawn. The first case in the previous paragraph is referred to in the literature as the L-resolution model, whereas the one in this paragraph is the H-resolution model (Strahler, Woodcock C., & Smith 1986), where H and L stand respectively for high and low. As these authors pointed out, these concepts can be extended to time as well as space, and will be referred to as H-frequency and L-frequency models, to distinguish them from the previous models. Temporal resolution refers to the repeat frequency of the measurement. This frequency has an impact on the information that can be derived from the data, since the electromagnetic properties of a given area are by no means stable. Change happens over time scales from seconds (leaves orientation changes due to wind, that affect e.g. interferometric coherence) to centuries (e.g. successional processes) (Davis & Simonett 1991). H-frequency implies that the observations are repeated at higher frequency than significant changes in the measured properties (e.g. daily NDVI from AVHRR), so that abnormal abrupt changes can be located in time (e.g. floods, fires). L-frequency means that the repeat interval of the sensor is too long to detect changes of interest whose effects disappear before the next observation (e.g. a flash flood remaining only a few days may not be detected by the 35-day repeat cycle ERS-2 SAR, and the same can be said for Landsat TM about seasonal vegetation changes in a site where cloudiness allows cloud-free images only in the dry season). As with spatial scale, the information content on landcover of a multitemporal series of satellite images from the same scene depends on the properties about which information is searched. If the aim is e.g. to monitor processes as urban development or deforestation, a series of SAR images acquired in the same month of consecutive years is appropriate, but it cannot bring more information than a single image about landcover attributes that depend on the seasonal variability of vegetation or the year cycle of different crops. Conversely, a series of 10 SAR images of the same year may bring enough information as to discriminate between 22
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ deciduous and evergreen forest or between different crops, but if used to monitor deforestation, the observed seasonal changes can led to wrong interpretations.
1.9. An example: information on floristic composition from EO data In summary, the information content of EO data about a particular theme depends on i) the type of analysis performed (which will be addressed in next section); ii) the size and defining properties of the objects of interest of that theme, iii) the relative contribution of these properties to the sensor measurement, iv) the relative contribution of other factors not related to these properties, and v) the resolution (spatial, temporal, spectral and radiometric) of the measurements. The impact of these factors will be illustrated with an example about an important property of landcover patches: their floristic composition. But first a definition of floristic composition within this context is needed. The characteristics of ecosystems are determined by the primary trophic level, the vegetation, so that the latter can be taken as surrogate of the whole ecosystem (Graetz 1990). For this reason landcover classifications use vegetation as the principal geographic phenomenon to classify the Earth surface into landcover types. Each landcover type is an abstraction of a collection of geographic objects (represented in the map by polygons) summarising the set of common attributes shared by these objects. At a broad level of abstraction, vegetation can be described by its overall appearance, i.e. its physiognomy, without entering in floristic details. The most significant physiognomic features are the height of the uppermost, or dominant, existing stratum (that would yield e.g. forest-shrubland-herbaceous categories), and the proportion of ground covered by that stratum (e.g. dense forest-woodland-sparse woodland). A lower level of abstraction would require additional information on the phenology (deciduous v. evergreen) and shape (broad-leaves v. needles) of leaves most frequently found in the dominant stratum. If a further level of discrimination is required (e.g. we want to know the quality of timber we could extract from a conifer forest) then floristic information is needed, ranging from only identifying the dominant species (this would be enough for the former example) to a complete floristic inventory with the occurrence and frequency of plant species in all the strata (this would be required for e.g. a biodiversity study). 23
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
On the other hand, the value of spatial data decreases with the outer scale (geographical extent), and its cost increases with the inner scale (resolution). Vg the scale of observation appropriate for a comprehensive floristic study requires expensive and time-consuming field surveys. These cost/benefit considerations make unsuitable to base landcover mapping in units finer than the landcover patch defined in 1.4. Therefore studying floristic composition within this context means to identify the main species occurring in the dominant stratum of the vegetation, which can be observed by EO satellites as opposed to the understory. The size of the diagnostic elements ranges from the few centimetres of the herbs of a grassland to the several tens of meters of the trees of a forest. The scale of observation ranges from millimetres (as for e.g. counting the stamens of a flower) to tens of meters (e.g. to appreciate the shape of a tree). The properties that lead to the identification are related to the morphology (type of flowers, fruits, leaves, bark, ramification, and crown) of parts of the object or of the whole object (plant individuals). Once identified, a set of other properties can be brought forward by previous knowledge accumulated by the sciences of Botany and Ecology, as timber quality, periods of flowering and fructification, requirements in terms of light, rain, temperature and soil, association with other species, etc. Conversely, an expert can use this knowledge to make a heuristic conjecture on the species composition when the scale of observation is inadequate. This allows e.g. a forester to identify a tree species from several hundred meters when there is enough contextual evidence. The contribution to EO measurements of the properties used in the identification is low in comparison to other factors. There are some relationship between inherent properties of different species as the optical thickness of individual leaves and the response of an optic sensor, or as the size and shape of leaves and the radar backscatter, that can be exploited to derive floristic information. The problem is that the measured response may be overweighted by factors that are not intrinsically specific, as the amount of green matter in the case of optic sensors or the relative dielectric constant in the case of radar. Despite this shortcoming, the focus of analysis has been on identifying the spectral signatures of the classes of interest. The reason behind this approach is that the spatial resolution of EO
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ data has been too low1 to identify single trees by their shape or leaves (Landgrebe 1999). As a matter of fact, the only spatial difference traditionally taken into account in quantitative analysis of EO data has been texture2. But, even acknowledging that different landcover types may yield distinct textures for a given pixel size, the problem is that it is not trivial to determine, for each location in the image, the region from which the texture measure should be derived (Lobo 1997). This issue deserves a one-paragraph detour. Texture measures are usually produced by computing a statistic (e.g. variance) from the DNs inside a rectangular template (e.g. a 5x5 pixel window) centred at each pixel. The resulting image is subsequently included in the analysis as another ‘spectral’ band. Apart from being a mixed bunch, this approach has three serious problems. First, window-derived measures cannot distinguish bare texture from relevant geometrical patterns, since the technique is aspatial, i.e. it does not take into account the position of each value within the window (Ferro 1998). Second, depending on the size of patches relative to the pixel, there may be a lot of locations for which the respective window will include pixels from an adjacent patch, yielding a confusing measurement. And last but not least, as a consequence of the former, classification accuracy using texture is far more dependent on the size of the computing window than on the type of texture measure employed (Marceau et al. 1990). Turning back to the vegetation classes commonly studied with EO data, they usually consist of broad categories (broad leaved forest, coniferous forest, grasslands, etc), but sometimes they include some floristic element in the definition (e.g. Fagus sylvatica forest). In any case, the underlying assumption of the spectrometric approach is that each class has a consistent signature that can be separated from the rest, that is, patches of the same class will show a similar response, and this response will not be similar to the one of patches of a different class. Unfortunately, this is seldom the case: even correcting for atmospheric and illumination effects, one cannot count on a vegetation class having a consistent spectral signature, no matter how high the spectral resolution is. The radial structure of classes (discussed in 2.2.3),
It has been historically unfeasible, for both economical and technical reasons, to reach higher resolutions. Higher spatial resolution implies huge data volumes and faster downlinks, improved optics and electronics, and precise platform control. On the other hand it leads to reduced swaths, hampering a quick coverage of large areas. 2 Texture is the local variation of brightness in an image caused by the irregular response of spatial structures that consist of recurrent elements (e.g. leaves in IKONOS or trees in Landsat) that cannot be resolved by the sensor because of their reduced size in comparison to the sensor’s GIFOV. If one thinks of a digital image as a tangible surface given by DN(x,y), then texture could be assimilated to the roughness sensation of that surface when passing a finger across it (Schowengerdt 1997).
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ which may include in the same group very different settings, precludes the very existence of ‘class signatures’. Some of the parameters affecting vegetation reflectance relate specifically to particular species, like the morphology of individual plants, including the size, distribution, orientation and optical properties of leaves. But many others do not, like the structure of the canopy, the nature of background soil and/or understory, time of the year, growth stage (even-aged stands), plant health, and moisture content (Armitage, Weaver, & Kent 2000). Besides, solar light interaction between the different parts of the vertical profile of a vegetated area is a rather complex one, involving multiple scattering and selective transmission among the parts. This interaction is even more intricate when the canopy is not dense enough and the trees do not cover totally the ground, as is the norm in Mediterranean landscapes. This means that the overall response cannot be reduced to the summation of the individual responses, making spectral unmixing unfeasible when the pixel size is in the same range than this interaction. The classical pattern of reflectance from vegetation (low in the visible and high in the nearinfrared) depicted in textbooks is based on the response of single leaves. Therefore it can only be assimilated to the one of the canopy if LAI is high enough (Curran 1985). In this case, the identification of canopy species might be feasible with an adequate spectral resolution. Under other circumstances, the correspondence between spectral response and species composition would be difficult to identify. Regarding temporal resolution, the different rhythms between species in response to seasonal changes (defoliation and flowering) create variations in the sensor response that can be successfully used to discriminate between species when H-frequency data are available. Hence intra annual spectral-temporal profiles, together with knowledge of crop calendars and vegetation phenology, can be used to map different landcover types and even individual species (Reed et al. 1994). However, there are three limitations to this approach. The first one is due to inter annual variations of phenological events, related to changes in accumulated precipitation and temperature from one year to another (Weber 2001) that may alter the known sequence upon which the identification is done. The second refers to variations regarding the general profile due to latitude and/or altitude. The third and last is the difficulty to get a complete yearly profile due to cloudiness, so that some key observations may be missing. 26
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
The latter impediment does not hold for radar images, although (Proisy et al. 1999), studying the foliage seasonal cycle of a mixed forest using ERS SAR (C-band), concluded that it could not be reliably detected, partly due to the strong contribution of branches in the backscatter. As a point of fact, it is widely accepted that it is not possible to identify individual tree species with the current spaceborne radar systems (Quegan et al. 2000;Wagner, Vietmeier, & Schmullius 2000), although the advent of multifrequency multipolarimetric radars may change the picture. In short, the information content from EO data on floristic composition, that is, the possibility of identifying individual canopy species through their analysis, is poor. This is mainly due to the inadequate scale of observation and to the variable contribution of species intrinsic properties to the sensor response. However, different combinations and abundance of species produce differing patterns in the image (Armitage, Weaver, & Kent 2000). These differences could be better exploited if the focus of the analysis shifts from the spectral signature of classes to the more realistic one of the spectral signature of individual patches, as proposed by Smits and Annoni (1999). This shift implies that an estimation of the form (size, shape and location) of the patch has to be available before inquiring about its substance (physiognomic and floristic properties).
1.10. Analysing EO data to derive information on landcover The goal of image analysis, as stated before in relation to the study of landcover, is to produce a handy representation of the imaged scene by identifying in it the objects of interest, landcover patches in our case. The identification of an object involves two aspects: i) its form, i.e. the boundaries of the object; and ii) its substance, i.e. the constituents from which the object is made. I consider the form aspect as related to the spatial structure of the image, whereas substance, although not alien to the former, has more to do with the spectral structure. In any case, the result of the analysis is a thematic map of the territory under study that portrays the type of land cover object that is expected to be found at each location. Again, the assumption is that the objects created by the analysis have a definite correspondence with the objects in the real world that gave raise to the (spatial and spectral) structure of the image. 27
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
Two further assumptions will be made within the scope of this thesis. The first one is that imaging geometric distortions (due to relief effects and off-nadir viewing angles) can be adequately rectified, and that referencing to a geographic planar projection system (as e.g. UTM) is performed accurately, so that positional accuracy is good enough (that is, that the actual coordinates of the centre of a given pixel are within that pixel). The second one is that the final format of the map is vector rather than raster, in other words, that the minimum mapping unit (MMU) is several times larger than the pixel. Under this circumstance, it is far more economical (recall the MDL principle introduced in Appendix 2) to deal with polygons than with individual raster cells. The MMU concept is the cornerstone of the conceptual framework proposed in this thesis, and will be further discussed in chapter 2. Having taken for granted the correct pre-processing of the available images (co-referencing, resampling and geocoding), it can be said that there are two main approaches to image analysis, according to the order in which the identification of objects takes place: objectbased analysis (or first form, then substance methods) and pixel-based analysis (or form from substance methods). In the first case, a segmentation (i.e. a partitioning into nonoverlapping regions, or segments) of the image is carried out as a first step in the analysis. The resulting segments are subsequently classified using inter alia their spectral properties, which are usually estimated from the mean values of the pixels belonging to each segment. After having been classified they become objects endowed with an enduring identity. In the second case, pixels are labelled (i.e. classified) according to their location in the multidimensional data space, without considering (apart from perhaps adjacent pixels) their position within the image. Afterwards, objects are defined as sets of connected pixels with the same label. This latter step often involves a previous change in the label of some pixels as to enable the vectorization of the classified image, otherwise unfeasible due to the lack of spatial consistency of the result. This patching-up process is commonly referred to as postclassification. Object-based analysis will be discussed with greater detail in chapter 2, but before passing over to pixel-based methods, a mention should be done on a human-vision-driven objectbased method: photointerpretation. It was developed together with the use of aerial photography as a means for geological and forestry surveys, and is based in the visual differences that different landcover types yield. The differences are exploited by the 28
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ photointerpreter in order to divide the image into homogeneous regions that usually are later classified with the aid of a limited field survey. Visual interpretation of EO data is still common practice, since up to now, there is no computer program able to emulate the perceptual and abstracting capabilities of humans. However, humans show some serious limitations in order to fully exploit latent information of data. First, the number of grey levels distinguishable to the human eye (say some 16) is considerably smaller than the dynamic range of most EO data. Similarly, humans can only compare three bands simultaneously (RGB colour composites). Yet the main drawback is that visual interpretation is based on subjective judgements, as for example where to draw a sharp boundary between two patches that blend gradually into one another. Actually the subjectivity of boundary placement is the major factor contributing to positional error (Green & Hartley 2000c) of the produced thematic map. This inconsistency may cause serious problems in the updating even if it is done by the same person (Ahlcrona 1995), making unreliable any conclusion about changes in landcover drawn with this method. These shortcomings, together with the slowness, high cost and scarcity of skilled interpreters, have directed the research effort towards automated methods. Given the multidimensional nature of most EO data and their limited spatial resolution, spectral patterns have been favoured against spatial ones, so that pattern recognition in Remote Sensing is seen exclusively as a waveform discrimination problem (see e.g. (Fukunaga 1972)). With this view, the signatures of individual pixels became soon the undisputed basic units of the analysis. The dark side of this approach, so dark that many practitioners are still unaware of it, is that the latent information embedded in the spatial structure of the image is almost completely ignored. A brief description of pixel-based image classification is given in the next section, including a systematic critique.
1.11. A critique to pixel-based image classification In every scientific community there is a general accepted framework, constituted by formal theories and trusted methods, that shapes customary work (Kuhn 1962). In the case of the community of practitioners of RS digital image analysis, such framework originated from the older tradition of reflectance and emissive spectrometry. The latter consists of a series of instrumental techniques developed in chemistry and physics for determining the composition 29
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ and other properties of materials, from organic compounds to stars. Such framework, that seizes the multiband nature of RS imagery, will be called hereafter the spectrometric approach (SMA). SMA is what in RS literature is called quantitative analysis. Since most ‘quantitative’ methods are based in one way or another on waveform discrimination, I prefer the former name. Although waveforms are not necessarily spectral, they are analogous to spectral signatures, hence the name. The spectrometric approach uses pattern recognition methods to group individual waveforms, or signatures, into classes. A signature consists in an n-component vector where each component usually is the value taken by a given individual pixel in each of n bands. Hence the basic (areal) unit of the analysis is the individual pixel. Pixel-based classification is so widespread (as matter of fact, it is practically the only classification featured in most commercial RS image analysis products) that it constitutes the prevailing paradigm of this discipline. In this section I discuss the inappropriateness of using individual pixels as the basic units of the analysis, in the context of landcover mapping. In general, the validity of the spectrometric approach is not questioned here, although it will be addressed in the last section of chapter 2. Image classification, as conceived by the spectrometric approach, is the process of delineating the regions of the multidimensional data space associated with each class of interest wi (i=1,..., M). The data space is populated by signatures that have to be allocated to some userdefined class. In the pixel-based paradigm, each signature comes from a single pixel. The set of classes must meet simultaneously three conditions (Landgrebe 1999): Exhaustiveness: there must be a class to assign to each pixel in the image, i.e. there can be no unclassified pixels.
Separateness: the classes must be separable to an adequate degree in the data space. Usefulness: the classes must be of informational value for users and meet their needs.
The separability constraint requires that, as a general rule, signatures from different classes are relatively distant from each other in the data space. In other words, if a signature is to be correctly classified, then the majority of its nearest neighbours in the data space should belong 30
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ to the same class, i.e. it should have a pure neighbourhood. Conversely, when two or more classes occupy the same tracts of the data space, i.e. when they overlap, the signatures populating these tracts will have a mixed neighbourhood and therefore it is likely that they are confused. Therefore the degree of overlap between two given classes will account inversely for their separability (Schowengerdt 1997). The classification is carried out with the aid of a set of discriminant functions gi (one for each of M classes), such that given a signature X, gi (X) is greater than the other gj when X belong to class wi. In other words, X is classified as a member of class wi if and only if gi (X) ≥ gj (X) for all j=1,2,... M (Landgrebe, 1999). In order to build the discriminant functions, some amount of ground information (obtained e.g. by field surveys), showing the ‘true’ class of known locations within the scene, must be available beforehand. An exception to this occurs when the classes are identified according solely to the clustering patterns found in the data space. In this case, known as unsupervised classification, the result is a partitioning of the data space into spectral classes that are assigned a posteriori to information classes (those appearing in the map legend) by an analyst based on ad hoc collected ground information. The number of clusters, or spectral classes, must be limited by the analyst, and the algorithm employed is usually aimed at minimising the variance within each cluster, that is, maximising the variance between clusters1. Conversely, common supervised classification methods use a set of training pixels from a priori known locations to estimate the multidimensional probability density function associated with each class, which is used as the discriminant function for that class. Then the most likely class for each pixel is selected, leading to a minimum average classification error given the estimated functions. This process is most simple (and this is why is so popular) for the case in which classes are normally distributed (as in fact it is often assumed), only involving the calculation of class mean vectors and covariance matrices (Landgrebe, 1999). In cases where the gaussian assumption is not suitable because some of the classes show a multimodal distribution (i.e. consists of several distant clusters), a method that has become a popular choice is an artificial neural network (ANN), where the training pixels are used
Recall that the total variance of a data set consisting of disjoint groups of data is the sum of the internal variance of the groups plus the inter-group variance.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ empirically to construct the discriminant functions. A typical classification conducted with an ANN could be as follows. First, the parametric form (either linear or non-linear) of the discriminant function is chosen, and then the weights of each band are set to an arbitrary initial value. Later these values are adjusted iteratively, with the aid of the training pixels, as to minimize an error function. When the error is adequately small, the training process is stopped and the whole image is classified. By using an ANN one can achieve better accuracies than with statistical methods, at the cost of not knowing what has being done. This black-box problem (Benitez, Castro, & Requena 1997), together with a) the unstable balance between minimising the error function and avoiding excessive training (overfitting); b) the lack of theoretical basis; and c) the lack of definite criteria for choosing the best ANN architecture for a given task; has generated a controversy on the use of ANNs as a means for image classification (Egmont-Petersen, de Ridder, & Handels 2002). The dependence of the result on training data is not exclusive to ANN implementations. The accuracy of supervised classifications depends heavily on the quality of the training data, even more than on the actual classifier used (Buttner, Hajos, & Korandi 1989). Moreover, the same classifier can produce different results on the same image when trained with a different data set (Smits, Dellepiane, & Schowengerdt 1999). As a consequence, the result is prone to reflect inconsistencies in the selection of training samples. Thus ‘good’ training data must be fully representative of the respective class, so that they should be well distributed across the scene, and at the same time they should constitute a homogeneous sample of the class1. Since both objectives conflict, careful selection of training data is an arduous task. This problem is amplified in the case of hyperspectral data, since the number of training samples required to define the classes quantitatively grows very rapidly with the number of bands (Landgrebe, 1999). The accuracy of the classification is measured using a contingency table or confusion matrix that compares for each landcover class the predicted class with the actual one on the ground, computed from a subset of signatures from known areas that were not used for training. There are a number of methods to measure accuracy from this table (Janssen & van der Wel 1994;Stehman & Czaplewski 1998), the simplest being the percent correctly classified,
The homogeneity constraint is partially solved by picking up blocks rather than single pixels, because near pixels are usually correlated and therefore will show similar values. Collecting training data this way is fast and easy, but it violates the independency assumption of statistical sampling (another thing most practitioners are unaware).
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ usually called itself ‘accuracy’. In general it ranges from 60 to 85 per cent, which is normally below the user requirements (Lins 1994). Another inconveniences are the often biased spatial distribution of errors, and the significant differences in error rates that are usually found among the classes (Davis & Simonett 1991). The lack of spatial consistency, manifested by the mottled and noisy appearance of classified images, hinders seriously the integration of the result into a GIS when the latter requires the vectorization of the classified image. Therefore, in many cases the output image must be post-processed in order to repair such inconsistencies. In doing so, one is implicitly acknowledging a partial failure of the SMA when the classified units consist of individual pixels. Post-classification techniques take into account spatial context, i.e. the relationships between pixels in a neighbourhood, to improve the spatial consistency of per-pixel classifications. Examples are the various morphological filters applied to classified images, or the process of probabilistic label relaxation (Richards 1993). The neighbourhood of each pixel is usually defined by a 3x3 or 5x5 pixel matrix, or window, centred at the pixel. However, the assumption that the context of a pixel relies only on its first order neighbours can be increasingly restrictive with image resolution. On the other hand, enlarging the window increases the risk of including pixels belonging to a different patch and decreases the spatial accuracy of the result. Confusion increases when the pixel size is close to the mean size of the objects of interest (landcover patches), leading to a high proportion of mixed pixels located between adjacent patches, which are prone to be misclassified. When pixel size is much smaller than the patches, the proportion of mixed pixels will be negligible, but in turn the spectral variability of the patches will increase, the rate depending on landcover type, causing further classification problems (Hsieh & Lee 2000). The failure of the spectrometric approach in very high resolution (< 5m) data is partly due to the non fulfilment of a implicit assumption not acknowledged by many practitioners: pixel size should be large enough as to include a sufficient number of elements producing a typical signature of the class (Woodcock & Strahler 1987). The former assumption can be restated using Goodchild’s (1994) notion of spatial resolution of a classification. The latter may be defined as the minimum size of the circle, expressed by its diameter, over which the surroundings of a geographic point have to be observed 33
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ in order to define the label at that point. This circle will be called hereafter classification disk. If the classification is based on the presence and particular arrangement of some individual entities such as trees or buildings, it is obvious that the area of the disk should exceed substantially the extent of those (sub) objects, so that a sufficient number of them is included in the observation. Then the assumption requires that the spatial resolution of the imagery is coarser than the one of the classification scheme. Note that the minimum diameter is class-dependent, increasing with the size and spacing of the subobjects defining each class, e.g. grassland (say 1 meter) – dense forest (5-10 m) – sparse woodland (30-50 m) – urban (100-200 m). Therefore the spatial resolution of the overall classification scheme will be the diameter of the maximum minimal disk (the one corresponding to the urban class in the example). When spatial resolution of an image is coarser or close to that of the classification, (big) patches of all classes will likely appear as smooth surfaces. As the pixel size decreases, patches of some classes will begin to show an increasingly coarse texture, up to a point where their constituents become resolved, and therefore the spectral response of the sensor is more readily related to the properties of these individual entities than to the ones of the patch. Instead of accepting the fact that the spectrometric approach is not appropriate when applied to individual pixels of images of higher resolution than the classification, the community of practitioners, unconsciously willing to force nature into the box supplied by the paradigm, has either simply ignored the problem or patched it up with some post-classification technique. In summary, two basic assumptions underlie the spectrometric approach to image classification: 1) that the piece of terrain from which the measurement is drawn is large enough to include a sufficient number of elements producing a typical response of a landcover class; and 2) that each landcover class shows a negligible degree of overlap in the data space with the other classes. Now we shall see that both assumptions cannot be fulfilled simultaneously if the spatial distribution of the phenomenon under study is not taken into account, as e.g. it occurs when data units are drawn systematically from a grid. Suppose we establish a square plot on the ground of the same dimensions than the areal units (pixels) used for classification. Imagine further that we cover its sides with opaque walls, so that an observer standing on it cannot see anything out of the plot. Then, if the first assumption is valid, there should be enough evidence in the plot as to correctly classify it. 34
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ Otherwise the hypothetical ground observer would be liable to make the wrong guess, and this would happen more often as the plot size is reduced. Imagine the case of a 1m-side plot (analogous to an Ikonos pixel) that is randomly placed within a sparse mixed woodland. It would be impossible for the observer to identify this piece of terrain as a belonging to a ‘sparse mixed woodland’, no matter the actual position of the plot. Even if we allow observation in a 3x3 m2 plot (which would be analogous to contextual –neighbour influencedclassification), the identification would be wrong again. Now assume that the pixel size is large enough as to fulfil 1), and lets explore 2). This assumption has two underlying premises: a) that there exist natural groups, or clusters, within the data space; b) that each cluster has a prevailing landcover class, i.e. that most of the signatures compounding it belong to the same class. Premise a presupposes a non-uniform arrangement of signatures in the data space. This can be taken for granted, given the patterned structure of the imaged territory. A different question is the discovery of natural groups within the data space, since the concept of ‘natural’ is vague. Let us simply assume that naturalness is gradual and it refers to the existence of discontinuities, or quasi-empty space, separating the clusters. In the absence of such discontinuities, signatures close to the boundaries separating classes would have mixed neighbourhoods, and therefore would be liable of misclassification. Premise b is at the core of the spectrometric approach, for if there were clusters densely and randomly populated by signatures of different classes, the separability of the classes involved would be seriously compromised. Now lets study the case of mixed pixels, i.e. pixels lying on the boundary between adjacent patches of different class in the image. Their signatures consist of a mixture of typical signatures from two or more classes. Therefore, they are located in the region of the data space separating clusters of these classes, which in turn is supposed to be constituted of quasiempty space. This circumstance requires the proportion of mixed pixels to be negligible. But if assumption 1) is fulfilled, the only way to have a negligible proportion of mixed pixels is that the spatial configuration of the territory is a simple one, consisting of big patches (hundreds of times bigger than the pixel) with convex shapes (so that the edge density is low). Such configuration is the exception rather than the norm, since most landscapes can be
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ conceived as mosaics of fragmented, intermingled landcover types. Therefore, for most situations, assumption 2) cannot hold if the first one is fulfilled. The only situation in which both assumptions may hold is for a reduced set of broad classes that are spatially segregated over the territory (i.e. there are no holes of a different class within the patches). Since, as explained in 1.4, the more we generalise in geographic space, the larger the objects of interest, it can be expected that patches corresponding to broad categories are big enough as to keep at bay the number of mixed pixels. In any case, an approach that is only suitable for a few restricted situations should not be taken as a general paradigm, and this is the case of pixel-based classification, which is used irrespectively of the pixel size, the number and type of classes, and the nature of the territory. The problems of pixel-based classification are known since long ago (Markham & Townshend 1981;Woodcock & Strahler 1987). Basically, when the pixel size is below a certain threshold (given by the spatial resolution of the classification), intraclass variability is high, as well as class overlap, hindering separability. As the pixel size is enlarged, class variability is reduced at the expense of increasing the number of mixed pixels, and therefore decreasing the accuracy. The only novelty here is the perspective, consisting of two nested paradigms and their underlying assumptions, under which this problem has been re-examined. The wider paradigm conceives the territory as made of distinct homogeneous materials (landcover classes) with unique spectral properties that are spatially distributed into disjoint pieces larger than a pixel. The narrower paradigm establishes an exhaustive systematic (regularly distributed) sampling scheme as the means to retrieve the spatial distribution of the classes. The question is not one of determining whether pixel-based analysis is invalid or not, since there is no dichotomy from inadequacy to suitability but gradation. The point is that if we stick to pixel-based classification, we are never going to achieve the level of accuracy required by sound landcover mapping, since the basic assumptions of the spectrometric approach cannot be fulfilled simultaneously if the basic unit of the analysis is the individual pixel. It could be argued that the problem of mixed pixels could be tackled with spectral unmixing techniques, and I would not deny it. The problem is that the trend in RS imagery is towards higher spatial resolution, and then we encounter a hard nut to crack: the spatial resolution of the classification. 36
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
In order to make the spectrometric approach temporarily compatible with a soft version of the object-oriented paradigm, let us now imagine that the measurements are not drawn from the cells of a regular grid superimposed on the territory but from a set of jointly exhaustive, mutually disjoint, irregularly shaped cells, or segments, forming a partition of the image. Suppose further that i) such partition keeps a good correspondence with the structure of the territory, so that the boundaries separating the segments correspond to discontinuities on the latter; ii) all the segments exceed the minimum size imposed by users as to be representable and potentially meaningful for them; and iii) all the segments are relatively homogeneous, so that their degree of homogeneity is higher than the one that would have the union of any given segment with any of its neighbours. This situation would be equivalent to an imaginary sensor consisting of detectors that are able to adapt their IFOV to the structure of the imaged scene, so that it deforms and expands the shape of the IFOV until it finds a discontinuity, regularising the response of the surface bounded by such discontinuities. In this hypothetical situation, each resolution cell would correspond to a segment instead of a square pixel. Hence the data volume would be considerably reduced, consisting only in one signature per segment. Since in principle there are no ‘mixed segments’, and the segments (in virtue of ii) are larger than the classification disk, assumptions 1) and 2) of the spectrometric approach can hold simultaneously under an incomplete version of the object-oriented paradigm that uses the same class concepts than the SMA. The term ‘object’ is used to refer to a region of the image having a unitary and cohesive identity that is closely related to the one of the geographic object that gave raise to such structure in the image. Once the object is identified and contextualised, the relation between the object and the region of the space occupied by the object is no longer one of identity, since objects may move, grow, shrink or even disappear, but regions by necessity are located where they are and have the extension and shape they have (see 2.2.7). Hence the term object is best suited than segment or region for a diachronic study of the territory. Perhaps ‘zonebased’ would be the best term, since zone (a region distinguished from adjacent parts by a distinctive characteristic) captures better what the partition consists of, but it is important to stress that the thesis is intended to contribute to an ongoing effort to change the way RS images are analyzed for landcover mapping: the shift towards object orientation. 37
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ 1.12. The quest for object orientation in Remote Sensing Customary analysis of RS images is based on the utilities built in commercial software like Erdas, Envi, ER-Mapper, and so on. The classification methods they provide, except for a few exceptions, rely on the spectrometric approach developed in the 70s. The common characteristic of all these methods is that they hardly exploit the spatial structure of the images. Their output is commonly considered unsatisfactory from an operational point of view, especially when the task is directed towards the maintenance of geographic databases within a GIS environment. As a result, RS products are not as frequently used for natural resource monitoring and management as envisioned by the engineers who developed them. Consequently the demand for RS images is overwhelmingly lower than the current supply. In other words, the capacity to acquire data exceeds by far the capacity to produce information from the data. This confronts space agencies and vendors with a serious problem. The former find difficult to justify expensive investment in Earth Observation programmes and desperately search for new users and applications. The latter sadly confirm one year after another that the pace of sales do not follow their expectations. The response to crisis usually takes the form of a paradigm shift that triggers new developments (Khun 1962). In Remote Sensing, it can be viewed as a shift from the spectrometric approach to the object-oriented approach. The latter uses objects in addition to classes in order to model the landscape. Generally speaking, an object is anything to which a concept applies. More specifically, an object represents an individual, unit, or entity, either real or abstract, with a well-defined role in the problem domain (Booch 1991). Vg a landscape object is a patch, defined as a discrete spatial unit having a certain minimum extension and differing from its surroundings in nature or appearance. A quite different concept is a software object, which is a code module that wields data and the code that manipulates the data into a single entity. However both have in common that any single object is an instance of a class. A class is a set of objects that share a common structure and a common behaviour. The class relations among objects are represented in ‘kind –of’ hierarchies (taxonomies) that provide inheritance, and structural relations among objects are represented in ‘part-of’ hierarchies (partonomies) that provide encapsulation (information hiding). The object-oriented approach is especially useful for representing and interpreting the enduring structures of landscape, integrating relevant physical entities (patches) into a coherent relational framework. In order to achieve such framework, two changes are needed: 38
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ on the one hand, the shift from individual pixels to image objects (groups of connected pixels that potentially correspond to patches) as the basic units of the analysis, and on the other hand, the shift from classes conceived as types of homogeneous materials to classes referring to types of landscape objects. This section is a brief and incomplete account on the path already followed towards the first shift. The second one, which has hardly been explored up to now, is addressed throughout Chapter 2, and its implications discussed in section 2.7. The textured appearance of RS images triggered early attempts to incorporate spatial concepts in their analysis, since texture is a disturbing factor for the spectrometric approach. Spectral classifiers were originally developed for Landsat MSS images at 80 m resolution, hence the assumption of low local variance (smooth texture) of landcover classes was more plausible then than now. Each pixel was considered as a sample measurement from a larger element (patch) made of a particular homogeneous material (landcover class), and as such it could be analysed independently of neighbouring measurements. Coarse texture areas were assumed to correspond to classes whose constituents have a size in the same range than pixels and therefore cannot be considered homogeneous at the resolution of the image. Note that the alternative interpretation that coarse texture is due to different classes interspersed at intervals close to the pixel size was discarded in terms of explanation but it was implicitly accepted as an anomaly referred to as ‘scene noise’. One possible solution to tackle coarse texture is to smooth the image with a filter at the expense of decreasing spatial resolution. If by any reason the original resolution is to be retained, then the only alternative is to divide the scene into regions defined as groups of contiguous pixels which may be presumed to belong to a common class and to extract average signatures from these regions. The easiest way of doing this is when there is a previously existing partition that can be assumed to reflect adequately the spatial distribution of classes within the scene, as e.g. a cadastral vector layer of agricultural fields where there is only one type of crop per field. The mean (and possibly the variance) of pixels within each field is used as the input for classification, and the accuracy of the result is considerably improved. This method, which was proposed as early as in 1969 (Huang 1969), is usually referred to ‘per parcel’ or ‘per field’ classification to distinguish it from other methods where the partition has to be defined from the image itself. The latter are based upon image segmentation.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ The goal of segmentation is to partition the image into a set of jointly exhaustive, mutually disjoint regions that are more uniform within themselves than when compared to adjacent regions. Segmentation techniques provide a primary model of the spatial structure of the image that is used subsequently to form classified objects. The ECHO classifier developed by David Langrebe’s team at Purdue University in the 70s included a segmentation algorithm that can be viewed as the first milestone in the quest towards object orientation in Remote Sensing. The image was divided into ‘cells' of 2x2 pixels that were subject to a simple test of statistical homogeneity. Cells failing the test were assumed to overlap a boundary and were later classified on a per-pixel basis. Adjacent cells passing the test were selected and subsequently subject to hypothesis testing for statistical similarity. Cells found similar were merged or annexed into regions (note that although the image is processed in a single pass, similarity is considered transitive, and hence regions can be formed by strings of merged cells). ‘In this way an object can grow to its natural boundaries, whereupon either the cell selection or annexation test will fail’ (Landgrebe 1980). Although Landgrebe made available a Fortran implementation of ECHO (user’s guide included) to the community of practitioners, it seems it raised little interest. On the one hand, the homogeneity and similarity thresholds had to be set on a trial and error basis. On the other, the merging sequence could lead to unwanted results, since it allows the creation of regions with large differences between pixels at opposite extremes. In the end, I guess that most people thought that similar results could be achieved by post-processing the pixel-wise classified image, therefore they did not bother checking out the new method or developing a similar approach. The lack of interest in image segmentation is confirmed by the fact that only a handful of papers related to segmentation of RS images were published in the 80s (e.g. (Nazif & Levine 1984;Derin & Cole 1986;Cross, Mason, & Dury 1988)). As a point of fact, classical RS textbooks (e.g. (Richards 1993)) included no section on the subject. A contemporary survey by Haralick (Haralick & Saphiro 1985) classified existing segmentation algorithms in three classes: clustering, region growing and split-and-merge procedures. The second category was further subdivided into single linkage (proper region growing), hybrid linkage (edge detection) and centroid linkage (region merging). This classification is somehow confusing, since e.g. many region-merging methods can also be viewed as spatially constrained clustering methods (see below). However it is important to note that region merging (aggregation of adjacent regions) and region growing (annexation of 40
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ individual pixels to neighbouring regions previously formed from seeds) are frequently used as synonyms. Notwithstanding it, the term ‘region merging’ reflects better what the methods studied here do, and hence will be preferred. Another major contribution to the field is the stepwise optimisation algorithm of Beaulieu and Goldberg (Beaulieu & Goldberg 1989). It begins by considering single pixels as the initial regions. At each iteration, two adjacent regions are merged provided they minimise a heterogeneity criterion. The candidate pair that produces the least increment in heterogeneity (i.e. that shows the highest degree of fitting) is merged first. The initial regions are merged gradually in this way. Hence the process can be seen as a hierarchical agglomerative clustering constrained to adjacent regions. The height in the dendrogram of each merger is the value of the heterogeneity criterion for that pair. Partitions at higher levels of the dendrogram consist of fewer regions, so that this sequence of partitions may reflect the hierarchical structure of the image. Each partition is optimal regarding the minimization of the heterogeneity criterion, but the procedure is too slow, since it allows only one merge per pass (a strategy known as global mutual best fitting). Besides, it leads to an uneven growth of regions between areas of smooth and coarse texture. Woodcock and Harward (Woodcock & Harvard V.J 1992) introduced a faster algorithm that allows multiple mergences per pass. Given a target global threshold Tglob of a dissimilarity measure (analogous to the foregoing heterogeneity criterion), they set a series of intermediate pass thresholds Tpass (< Tglob) of increasing value. Then at each pass, two adjacent regions are allowed to merge if and only if: 1) neither regions has previously merged on this pass; 2) the distance between the regions is less than Tpass; and 3) each region is the most similar neighbour of the other (local mutual best fitting criterion). Note that condition 1) is only necessary for cases in which there are ties in the dissimilarity distance and thus there are more than one nearest neighbour. As these authors noted, the global threshold alone leads to inadequate results, since usually there is a great disparity in size of the output regions. Areas marked by coarse texture will consist of many small regions (often individual pixels), whereas smooth uniform areas will be segmented into a single large region. Therefore they supplemented their algorithm with some size constraints that prevented excessive growing in smooth areas and forced the development of regions exceeding the minimum size of the mapping unit in areas with high local variance.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ The most important contribution of Woodcock and Harward’s (1992) paper to object orientation in Remote Sensing, rather than this algorithm1, is their nested-hierarchical scene model. This model assumes that the spatial structure of RS images reflects a hierarchy of nested levels, in which each object (e.g. a stand) can be conceived simultaneously as a whole made of smaller wholes (e.g. trees) and as a part of a larger whole (e.g. a forest). They pointed out that the piecewise homogeneous model underlying traditional segmentation methods is unrealistically simple, since it assumes that the objects of interest (landcover patches), as manifest in images, have internal variances that are both low and equal. Such assumption is inadequate for RS imagery, since different landcover types exhibit differing levels of internal variance given a fixed pixel size. As a result it is very unlikely that all the regions defined by a conventional segmentation method correspond to patches of the same level of the landscape hierarchy. Hence the need for size constraints focusing on a certain hierarchic level. This model is the counterpart in Remote Sensing of the hierarchical patch dynamics paradigm (Wu & Loucks 1995) in Landscape Ecology: “... The attempt here is to tie the hierarchical structure of images to the hierarchical nature of landscapes/classification schemes, and to note when this relation breaks down and why” (Woodcock & Harward 1992). Turning back to segmentation algorithms, the problem of beginning the region merging process by individual pixels was twofold. On the one hand, it was computationally expensive, and on the other, it precluded the usage of statistical dissimilarity measures as long as there were 1-pixel regions left. One of the first attempts to tackle this problem is the work of Lobo (Lobo 1997). Prior to the region merging stage, he applied an iterative edge preserving smoothing (EPS) due to Nagao and Matsuyama (Nagao & Matsuyama 1979). The output of the filter is a piecewise constant image made of tiny segments, where each segment represents a homogeneous region in the original image, darker or brighter than its surroundings (i.e. a blob). After labelling each segment, the resulting image can be taken as the baseline partition for the region merging stage. This partition can also be conceived as a primal sketch of the image, where the blobs conforming its spatial structure are formalized into segments. Although the filter he used was rather rudimentary (it was based on directional masks around each pixel that produced segments of biased round shape), the procedure contains the germ of object orientation, since these primal segments can be viewed as the basic objects that
Actually this algorithm was not the first of its kind to appear. Vg the Swedes (Hagner 1990) were already using a similar method for automated forest stand delineation, which used another dissimilarity measure and did not apply the mutual optimality criterion. Unfortunately Hagner did not publish his work in an international journal, and hence it can be assumed that it was unknown to most foreign researchers.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ compound the image. Unfortunately, Lobo (1997) deviated from the object-oriented path in the next stage of his method, by considering these segments not as objects themselves but as collections of measurements drawn from an larger object (patch) made of a particular homogeneous (with gaussian distributed properties) material (landcover class). Hence he chose a statistical approach to carry out the subsequent region merging process, which he named Iterative Mutually Optimum Region Merging (IMORM). IMORM, likewise Woodcock and Harward (1992) segmentation, followed the local mutual best fitting criterion and the stepwise increase of the dissimilarity threshold, but imposed no size constraint, since it assumed a piecewise homogeneous model of the scene. That is to say that IMORM considers the set of pixels within a segment as a sample from a bigger stochastic population characterized by a gaussian distribution. Hence the goal of the region merging stage was to retrieve maximal sets of spatially connected samples belonging to the same population (class). The way to achieve such goal was to test the null hypothesis that the samples extracted from two adjacent segments are in fact observations of the same population. Note that the overall population is the set of pixels that belong to that class. In this sense, statistical segmentation relies both on the spectrometric and pixel-based paradigms. The dissimilarity measure used by IMORM was the normalized difference between two means of samples of normal distributions, or t-ratio1. If for a given iteration and candidate pair (adjacent segments which are the most similar neighbour of each other), the null hypothesis could not be rejected at a confidence level given by the current t-ratio threshold, both segments were merged. Although apparently sound, this approach reveals an inherent contradiction. The t-ratio measures the statistical significance of the difference between both samples rather than the magnitude of that difference. Therefore when the temporary t-ratio threshold is increased after an iteration, candidate pairs that formerly were considered significantly different may be merged. Such incongruity can be mitigated if the final threshold is low enough as to avoid undersegmentation, i.e. the merging of segments of different class. In any case, the conclusion is that the selection of a dissimilarity measure for early segmentation is not a trivial question. The classification stage of Lobo’s (1997) method returned to object orientation. The final segments were considered as patches (or part of patches) that were to be classified through a
t-ratio is the absolute value of Student t test. Again, Olle Hagner (1990) was the first in using this approach, but his work remained unknown for most non Scandinavian researchers.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ linear canonical discriminant analysis. Each segment was treated as a basic unit defined by a set of attributes in a similar fashion as individual pixels are treated in pixel-based spectrometric methods, differing in that textural attributes are more efficiently extracted from segments than from kernels. However he did not use any spatial (size, shape) or contextual (relations between neighbouring segments) attributes. The problem is that contextual information tends to be declarative in nature (represented symbolically and independently of the methods to perform inferences on it), so that it is not trivial how to incorporate it into an algorithm. Early attempts (Goldberg, Goodenough, & Plunkett 1988;Moller-Jensen 1990;Ton, Sticklen, & Jain 1991) to analyse RS images using a knowledge-based framework (i.e. interpretationguided segmentation) did not result in any operational method, with perhaps a couple of exceptions. One of these was the System of Experts for Intelligent Data Management (SEIDAM, http://www.aft.pfc.forestry.ca/seidam_e.html), a Canadian prototype system of multiple expert systems (developed in the 90s and apparently no longer used now) that could update existing forest resource inventories and respond to queries by dynamically selecting RS data in a distributed GIS environment. The other is AIDA (T?njes et al. 1999), a knowledge-based system for the interpretation of RS data, which was the first system using a semantic net to formulate knowledge about scene objects. Semantic nets (Brachman 1977) are directed acyclic graphs where the nodes represent the objects of interest (including subobjects and superobjects) and the links form the relations between objects. The objects properties are described by attributes attached to the nodes. They have a value measured from the data and a range describing the expected attribute value for each type of object. Some of the attributes may be relational, taking e.g. into account topological relations that may affect the attributes of neighbouring nodes. The instantiation (allocation of an image object to a predefined type of object) of objects is conducted by a judgement function that computes the compatibility of the measured value with each hypothesis regarding the type of object or subobject each image object is. Finally, an inference engine determines the sequence of rule execution, which is based on a model-driven interpretation with a data-driven verification of hypotheses. However, the real break-through in object-oriented analysis of RS images, that for the first time provided users with an operational tool, was the introduction of the eCognition software at the ISPRS Conference in Amsterdam in summer 2000. eCognition (http://www.definiensimaging.com) is an analysis-specific (no image preprocessing built in) program for 44
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ multiresolution segmentation and object-oriented fuzzy-rule classification, specially suited for very high resolution imagery. It is based upon the Fractal Net Evolution Approach (FNEA) derived from the ideas of the Nobel laureate Gerd Binnig (Klenk, Binnig, & Schmidt 2000). FNEA describes complex image semantics within self-organizing hierarchical networks, in which the structure of each level is similar to the one of the others (hence the ‘fractal’ adjective). Objects derived from the input image change their states (structure and meaning) stepwise according to contextual influences and converge to a semantically coherent hierarchical arrangement through alternating procedures of segmentation and classification (hence the name ‘evolution’). The construction of the hierarchical semantic network of image objects is based initially upon a multiresolution segmentation (Baatz & Schape 2000). A typical analysis may consist in producing two (or more) coordinated partitions, one fine grained (with many small regions of relatively uniform size) and another coarse grained (with larger regions whose mean size exceed the one of the former partition by orders of magnitude), where the boundaries of the coarser partition are formed by boundaries existing in the finer one. Alternatively, the coarser partition can be obtained from a pre-existing GIS vector layer (if e.g. the aim of the analysis is map updating). Each region is considered an image object whose projection on the ground may have a definite meaning for users, either as an entity of its own or as a part of larger structure. Membership of an image object to a class is evaluated by fuzzy membership functions (one for each attribute) that maps class descriptions into the [0,1] real interval according to the typical values in each attribute showed by sample objects of that class. The set of classes and attributes may differ between partitions, with classes corresponding to a higher level of abstraction for the coarser partition. The overall classification scheme is structured in three kinds of hierarchies. In the inheritance hierarchy, class descriptions defined in parent classes are selectively passed down to their child classes, reducing redundancy and complexity in the class descriptions. In the group hierarchy, classes are combined into classes of superior semantic meaning. A class may be part of more than one group, and the grouping of classes may not coincide with the structure of the inheritance hierarchy. Finally, the structure hierarchy put together different classes that may compound complex heterogeneous geographic objects like a city.
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ All these semantic relations can be put to work in either a bottom-up or top-down fashion. In the first case, connected image objects of the finer partition that represent identical structures or that are parts of identical structures are merged into a new image object. In the second, the congruency of subobjects compounding a superobject is evaluated and eventually the hierarchical connection between them may be deallocated and assigned to a neighbouring superobject. Since these changes may affect the semantic value of neighbouring objects at the same or higher level of the object hierarchy, the classification is done iteratively in cycles in which each object is classified over and over taking into account the changes in the classification of networked objects. The result of this process is a network of classified image objects with concrete attributes, concrete relations to each other and concrete relations to the defined classes. The set of initial partitions upon which the classification starts is given by the multiresolution segmentation algorithm due to Baatz and Schape (Baatz & Schape 2000). Likewise Woodcock and Harward’s (1992) method, it starts with individual pixels as the initial segments. It also follows the local mutual best fitting criterion, but instead of using an stepwise increase of the dissimilarity threshold, at each iteration it distributes the candidate pairs to be merged (those having a dissimilarity value smaller than the global threshold) as far as possible from each other over the image, and these locations cannot be close to segments that were merged in the previous iteration. In this way, it achieves a uniform growth of segments throughout the image, so that the final segments have all a similar size. Since a conservative (small) threshold permits fewer merges than a greater one, the mean size of segments will grow with the value of the threshold. For this reason it is called the scale parameter. A hierarchy of increasingly coarser partitions can be obtained by simply raising the scale parameter. Such hierarchy is better suited than the one of Beaulieu and Goldberg (1989) for the study of the landscape at different levels of generalization, since objects at a given level have all roughly the same scale (size). Another particularity of this algorithm is the optional inclusion of a form heterogeneity factor in the overall dissimilarity between two adjacent segments of size n1 and n2. The latter is measured as the change in heterogeneity produced by their eventual merging, i.e. the difference hdiff (weighted by size) between the heterogeneity hm of the potential merger and the ones h1 and h2 of the segments: hdiff = (n1+n2)hm – (n1h1+ n2h2). The overall heterogeneity h is a linear combination of radiometric heterogeneity (expressed by e.g. the mean of the 46
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ variance in each band of pixels within the segment) and form heterogeneity (expressed e.g. by the ratio between factual edge length and the edge of a square with the same number of pixels than the segment). In this way the segmentation favours the construction of regions with smooth edges and a more or less compact form. Although such approach to tackling the fractal nature of the landscape is conceptually weak, the results are visually appealing. Another inconsistency is the consideration of individual pixels as the initial image objects, since they are artificial units whose shape is in no way related with the spatial distribution of landscape objects. An already proposed alternative (e.g. (Blaschke & Hay 2001)), which will be followed in this thesis, is to detect blobs with some morphological method and use the resulting fine partition as the input for region merging. Morphological segmentation, in contrast to the statistical one, is based on the spatial structure of the image, and has the watershed transform (see 3.7) as its cornerstone. Meyer ( 2001) gives an unified overview of morphological segmentation, a set of powerful techniques customarily used in computer vision and medical radiology that have been hardly applied to Remote Sensing but that will shape in all likelihood future work in this discipline. To summarize, efforts directed towards object orientation in Remote Sensing have followed a windy road with many comings and goings. The need for using distinct uniform regions instead of individual pixels was acknowledged since long ago as the only way to tackle the unevenly distributed heterogeneity of landscapes. However, many initiatives got stuck on the piecewise homogeneous model, which neglects the fact that landscape heterogeneity is hierarchically structured. The late acknowledgment of the scale dependency of most landscape attributes and concepts led to the emergence of the hierarchical patch paradigm both in remote sensing (Woodcock & Harvard V.J 1992) and landscape ecology (Wu & Loucks 1995). The natural way of making operational this patch concept is through objectoriented modelling, where each object is a structural-functional unit (a patch) at a given scale that is loosely coupled with both objects of the same level and objects forming part of it or encompassing it. So far, image segmentation seems to be the only operational solution providing a starting point to object-oriented analysis. But most methods have been developed heuristically without a deeper examination of the semantic implications of the segmentation process. As Bittner and Winter (Bittner & Winter 1999) point out, ‘for a better understanding of the relationship between objects of the real world and their representations, a better understanding of the underlying ontological and epistemological foundations is necessary’. Chapter 2 aims to contribute to this understanding. 47
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________
1.13. Motivation, objectives and main contributions A historical view would help to stress the main points. The first spaceborne sensors were multispectral to compensate for the reduced spatial resolution of the data, with the hope that different landcover types would behave like distinct materials susceptible of being analysed with a spectrometric approach. Hence it was natural to consider each pixel as a sample introduced in a desktop spectrometer, and therefore the individual pixel has been considered the basic unit of the analysis since the beginning of EO. The fact that these samples do not come separately (rather they are knitted into an image full of spatial patterns) was neglected and even considered of little use, except for photointerpretation. Several classification methods were developed based on this approach and soon (even before the launch of Landsat1) they were established as the trusted common practice, becoming the accepted paradigm in the analysis of EO data. The initial resolution of these data (80 m) was compatible with the spatial resolution of most classification schemes (see 1.11), but as the technical developments enabled smaller pixel sizes, the radiometric variability of surface features increased. Therefore there was a need to incorporate simple spatial characteristics as adjuncts to the spectral ones (Landgrebe 1980), and textural features were the most successful candidates, enhancing somehow the accuracy of classification (Haralick 1979). This relative success, together with the use of smoothing filters that got rid of the (wrongly called) ‘scene noise’, allowed the paradigm to survive by tacitly forgetting the basic premise about the spatial resolution. Other approaches pointing towards object orientation (image segmentation followed by segment classification) were suggested as early as 1976 (Kettig & Landgrebe 1976), but not surprisingly they remained ignored until recently. On the one hand, they were outside the paradigm frame, and on the other, they were rightly criticised for several reasons, among others, the dependence on seed pixels and/or merging sequence, the need for used-defined parameters, and the lack of theoretical basis. The advent, in the beginning of this century, of civilian very high resolution multispectral satellites of the like of Ikonos and Quickbird, has brought into a sharp relief the inadequacy of pixel based analysis when the pixel size is smaller than the classification disk. Neither does texture play the same role than in Landsat-like imagery, for the unresolved elements are in 48
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ this case leaves or tiles rather than tree crowns or roofs. Simultaneously, segmentation techniques for grey-level images improved significantly as a result of research efforts in the fields of computer vision and medical radiology. However, these results have hardly been transferred to remote sensing, mainly due to the lack of definite shape (and even crisp boundaries) of the objects of interest, and to the multiband and multiscale nature of the images (Schiewe, Tufte, & Ehlers 2001). As a point of fact, up to now there is only a commercial software (eCognition) devoted to object-oriented image analysis. Although most users have been impressed by its results, the multiscale segmentation algorithm embedded in this software lacks an explicit theoretical framework, and the users have to find useful segmentation levels in a trial and error basis (Blaschke & Hay 2001). On the other hand, there has been little progress in segmentation of colour or multiband imagery (Kartikeyan, Sarkar, & Majumder 1998), due the relative lack of interest of the remote sensing community and to the monochrome nature of most imagery from the other disciplines. The more practitioners are aware of the inconsistencies of spectrometric methods, the more urgent the need for a new paradigm. The emerging object-oriented paradigm has being around for a good while, but in order to achieve a full conversion towards it, the concept of class should also be altered in order to make it compatible with the hierarchical patch model. To the best of my knowledge, no one has already put forward a solid conceptual basis for the new paradigm, neither given an explicit account on the implications of class-concepts in the analysis of RS images for landcover mapping. By shifting the concept of classes towards types of geographic objects, a new key concept appears as the basic unit of the analysis: the granule, or basic mappable zone. Finally, a general method of segmentation should be developed in order to change once and for all the way users analyse RS images for landcover mapping. The ultimate goal of this thesis is to make a significant contribution to this shift. Keeping in with the last statement, the purpose of this thesis is threefold: i) To expose the inadequacy of the spectrometric approach to image classification as a means for landcover mapping (sections 1.11 and 2.7).
G.Castilla’s Ph.D. Thesis. Chapter 1: INTRODUCTION_____________________________ ii) To construct a conceptual framework for an alternative approach that seizes the spatial structure of the image and that is based on basic mappable zones, or granules (chapter 2). iii) To develop an implement a tentative version of a general automated method to derive from a multiband image a partition consisting of granules (Chapter 3). The main contributions of this thesis, listed by objective, can be summarized as follows: 1) Related to the first objective: ? A re-examination of the main critiques to pixel-based image classification under a kuhnian (Kuhn 1962) perspective, in which the former is viewed as nested on a wider paradigm, the spectrometric approach, which is based on waveform discrimination. ? The identification of Goodchild’s (1994) notion of spatial resolution of a classification as the key concept to understand of the failure of pixel-based methods in high resolution imagery. ? The exposure of the conceptual incompatibility of spectrometric methods with the object-oriented paradigm when the latter is nested within the hierarchical patch model. 2) Related to the second objective: ? The identification of the need to shift class concepts from types of materials to types of geographic objects in order to make compatible the classification process with the hierarchical patch model. ? The application Thom’s (1975) theory of attractors to the morphology of geographic fields. In particular, the identification of this theory as a suitable conceptual basis to define a primal partition of an image. ? The introduction of the granule (a region different from its surroundings and larger than the MMU size) as the basic unit for Object-Oriented Classification of RS Images for landcover Mapping (OOCIM). ? The integration of several ontological and epistemological tools into a 3-tiered (commonsensical reality, geographic fields and classified objects)