J. Symanzik*, D. A. Axelrad, D. B. Carr, J. Wang, D. Wong, T. J. Woodruff* Jürgen Symanzik
George Mason University
Center for Computational Statistics 4A7 Fairfax, VA 22030
e-mail: symanzik@galaxy.gmu.eduAbstract
In this paper we report on an approach to visualize the U.S. Environmental ProtectionAgency (EPA) hazardous air pollutant (HAP) data on the World Wide Web (WWW).Long-term cumulative concentrations for 148 HAPs have been estimated as a part ofEPA's Cumulative Exposure Project (URL:CumulativeExposureProject) for each of the60,803 census tracts in the 48 contiguous states. Confidence bounds are available as wellfor this data set.
Our WWW approach is based on the Graphics Production Library (GPL) (Carr, Valliant,and Rope, 1996) and extends it with micromaps (Carr and Pierson, 1996; Carr, Olsen,Courbois, Pierson, and Carr, 1998). The GPL is a JAVA-based application, developedand maintained within the Bureau of Labor Statistics (BLS), that allows to displaystatistical data as row-labeled plots and time series plots on the WWW. This datavisualization approach closely follows recent recommendations on statistical graphics byCarr (1994) and Cleveland (1993, 1994). Micromaps are a series of small generalizedmaps that highlight the geographical region associated with statistical features inaccompanying plots.
When added to the GPL, micromaps serve two purposes. In addition to providing thegeographical component of a linked statistical summary, they serve as a navigation toolfor drilling down through a hierarchy of maps. A WWW user of our application canclick on a state and advance to the underlying county-within-state micromap display.Selection of a county leads to information at the census tract level. This hierarchy ofclickable maps and statistical displays allows the WWW user to look at estimates anddistributional summaries at the desired geographic resolution. Atypical values in thestatistical summaries may lead the investigator to points of interest at a higher spatialresolution level.
Additional interaction through the Web browser is possible. The user can easily togglebetween individual HAPs, look at formatted tabular data, and display and download theraw unformatted data.
1. Introduction
This paper reports on an approach to access and visualize the U.S. EnvironmentalProtection Agency (EPA) hazardous air pollutant (HAP - also called air toxics) data onthe World Wide Web (WWW). The Cumulative Exposure Project (URL:CumulativeExposureProject) Web page (Figure 1) has been designed and implemented to provideinsight into the scope and the underlying modeling process of the project. A major goalof the Web page is to provide access to the modeled 1990 HAP data at different spatialresolutions.
The Cumulative Exposure Project Web site has scheduled updates. A first version of theCumulative Exposure Project Web page went online in March 1998 to providebackground information on the project. An updated version that provides access to themodeled 1990 HAP data at different spatial resolutions is scheduled for release inDecember 1998. Additional graphical components will be added to the page in January1999.
In the Cumulative Exposure Project, EPA has developed modeled concentrationestimates of 148 air toxics for every census tract in the continental United States. Airtoxics are pollutants known or suspected to cause cancer and other serious human healtheffects. The modeled concentrations are annual averages for the year 1990, anduncertainty bounds have been developed for each estimate. Development of the modeledconcentrations for the 60,803 census tracts in the continental U.S. is described inRosenbaum, Axelrad, Woodruff, Wei, Ligocki, and Cohen (1999), and analysis of themodeled concentrations is presented in Woodruff, Axelrad, Caldwell, Morello-Frosch,and Rosenbaum (1998) and Caldwell, Woodruff, Morello-Frosch, and Axelrad (1998).The purpose of the web page is to provide easy, fast, and understandable access to themodel results. Since the results are numerous, involving 148 x 60,803 x 3 numericvalues, we have taken a hierarchical spatial approach in their display. The WWW userstarts at the top (US) level with data at the state level being displayed. Selecting a statecauses its county level estimates to be shown. Selecting a county causes its underlyingcensus tract level to be displayed. It is possible to toggle between individual HAPs atany stage. Mouse-clickable maps, menus, and geographical names in the Web documentallow the user to easily move up and down in this hierarchy of maps and displays. Thus,the user can look at data and summary statistics at any desired spatial resolution.In Section 2 of this paper we introduce micromaps and the Graphics Production Library(GPL) and explain their joint use in the Cumulative Exposure Project Web page. Section3 focuses on the tabular displays of the Web page. Section 4 concludes with a discussionof achievements so far and mentions possible extensions. Additional details on theCumulative Exposure Project and the related Web page can be found in Symanzik,Wong, Wang, Carr, Woodruff, and Axelrad (1999).
2. Linked Micromap Plots and the GPL
Linked micromap (LM) plots, often simply called micromaps, provide a new way ofviewing spatially indexed estimates and summaries statistics. Carr and Pierson (1996)and Carr, Olsen, Courbois, Pierson, and Carr (1998) provide the basic descriptions,several examples, and cite the connections of LM plots to other statistical graphics. ALM plot consists of parallel sequences of micromap panels, label panels, and statisticalsummary panels. The micromap panels are typically map generalizations or caricatures.The caricatures preserve region neighbors and enlarge very small regions so their color isvisible. The label panels provide regions names. The statistical summary panelsrepresent estimates, confidence bounds, and related information using the mostperceptually accurate representation, i.e., the position along a scale. Individual panelstake familiar forms such as dot plots, bar plots, and box plots. The representationtypically uses the encoding that has the highest perceptual accuracy of extraction, i.e., theposition along a scale extraction (Cleveland and McGill, 1984). The sequence of panelsresults from sorting and logical or perceptual grouping. Color and position linkcorresponding elements within the parallel sequences.
The design of LM plots contrasts with the design of classed choropleth maps. In LMplots, the maps are caricatures but sufficiently convey the location or regions associatedwith the statistical summary. The statistical summaries have high perceptual accuracy ofextraction. Classed choropleth maps use the best representation, i.e., the position along ascale, and most of the space to represent political boundaries. The classed choroplethmap design typically discards confidence bounds on statistical estimates, degrades theestimates into a few classes, and represents the order classes with a poor encoding,usually color. Thus the LM plots representation places more emphasis on the estimatesand their quality. Knowing the census tract, the county, or state name and generallocation is usually good enough for an interpretation of the statistical data.
The Graphics Production Library (GPL) is a set of JAVA class libraries for interactivestatistical graphics (Carr, Valliant, and Rope, 1996). The GPL was initially intended toadd interactivity, such as drag and drop comparisons, panel reordering and rescaling; andpan and zoom to the row-labeled plots of Carr (1994). It also follows recentrecommendations on statistical graphics as given in Cleveland (1993, 1994). The designof the GPL also addressed the display of times series and provided for incorporation ofmetadata, such as warning flags on the time series and links to articles on the time seriesadjustments. The library was developed to facilitate web distribution of statisticalsummaries from the Bureau of Labor Statistics (BLS). The GPL, as maintained by theBLS, provided a reasonable starting point for incorporating the micromap capabilities andproducing LM plots on the WWW.
Even though LM plots will not be fully available on the Cumulative Exposure Web Pagebefore January 1999, we can provide a first idea how these LM plots will look like.Figure 2 shows a micromap at the top (US) level with states sorted in alphabetical order.The total modeled 1990 HAP concentration of all 148 HAPs, aggregated over the censustracts in each state, is displayed. In our final version on the Web, this display will allow
the user to Mouse-click on a state and move to next underlying level. As an example,Figure 3 shows a micromap for Michigan. Again, the total modeled 1990 concentrationof all 148 HAPs, aggregated over the census tracts in each county, is displayed. However,although we used the total modeled concentration as a summary for all HAPs in thesetwo figures, this information will not be provided on the Web page. Obviously, the totalis quite meaningless with respect to the toxicity of a particular HAP. A relatively smallquantity of one air toxic may be far more poisonous than a much larger quantity ofanother air toxic. The total might easily lead to wrong conclusions when publicly beingdisplayed on the Cumulative Exposure Project Web page.
3. Tabular Displays
Similar to the graphical display based on LM plots and the GPL, we can access our datathrough formatted data tables. These tables are available at the same spatial resolutionsas the LM plots. We start at the top (US) level where data is being displayed for oneselected HAP, here benzene, for all 48 states and the District of Columbia (Figure 4).Summary statistics with respect to the selected HAP are being displayed.
We can now select a new state either by changing the menu option for States on top ofthe table or by Mouse-clicking on the desired state. In our case, we decided to look atdata for Rhode Island. The resulting table is displayed in Figure 5. Obviously, theselection menu on top reflects this state selection, allowing us to select one of RhodeIsland’s five states. The same summary statistics as in the previous figure with respect tothe selected HAP are being displayed, however this time at the state level.
Finally, we can now select a single county either by changing the menu option forCounties on top of the table or by Mouse-clicking on the desired county. In our case, wedecided to look at data for Rhode Island’s Bristol County. The resulting table isdisplayed in Figure 6. Obviously, the selection menu on top reflects this state/countyselection. Now the modeled 1990 concentration of benzene and a corresponding 90%confidence interval are being displayed in the table.
At any time, the user can toggle to a different HAP, a different representation (micromapsor raw data), or move up and down in the spatial hierarchy.
The final display type, the raw data representation, gives an unrestricted insight into theraw data as it comes out of the modeling process. In particular, no rounding takes place.This representation can best be used to download data for one or multiple regions andperform additional calculations or display the data in a different format. The data iscomma-delimited and can be transferred by copy/paste from the Web browser windowinto any application window. At the next stage of this Web page, it is planned to provideaccess to entire data files at a higher spatial resolution to ease further usage of the data.
4. Discussion and Outlook
When finalized in January 1999, the Cumulative Exposure Project Web page will providean easy-to-use and up-to-date interface for access and display of a large environmentaldata set at different spatial resolutions. The combination of LM plot design and the GPLprovides the basis of a hierarchical clickable approach to the display of spatially-indexedestimates and summaries. The hierarchical spatial approach is intuitive and has beeneasily understood by WWW users with little statistical experience. The approachprovides a visual query language that can provide access to summary tables and raw data.Future work will add more user preferences and allow for an additional analysis of thedata.
This work promotes the display of confidence (or uncertainty) bounds. Such bounds areregularly reported with professional political polls. The presence of such boundssuggests the use of appropriate statistical methodology and the absence of bounds raisesthe question, do the reported estimates have any statistical validity? The modeling ofHAPs results in wide uncertainty bounds. The Web page provides background on thesemodeled bounds. While methods for calculating bounds are subject to debates andrefinements, publishing wide bounds reduces the chances the reader will be badly leadastray by poor estimates.
There are several possibilities for additions to this Web page after the completion of themicromaps display in January 1999. Currently, it is not possible to access the modeled1990 concentration for all 148 HAPs for a particular location (either state, county, orcensus tract) at the same time. Instead, one has to toggle through 148 different Web pagesto learn about all the HAPs at one location. One goal is to make this data available at oneglance, which requires to display data of very different dimensionalities.
Also, currently there is no easily accessible site on the Web that allows the mapping of aparticular ZIP code to a census tract number which are used throughout the CumulativeExposure Project. We plan to add such a mechanism to our Web page.
At this stage, only the modeling of the 1990 air toxics is finished. Once available, weplan to dissiminate the modeled data for exposure levels for chemical contaminants foundin public and private drinking water supplies and exposures to contaminants in foods in asimilar way through this Web page.Acknowledgments
EPA funded the majority of the work behind this paper under contract No. 098272 andcooperative agreement No. CR825564-01-0. Additional federal agencies, BLS andNCHS, supported some facets of this work. The article has not been subject to review byany of these agencies so does not necessarily reflect the view of the agencies, and noofficial endorsement should be inferred. The conclusions and opinions are solely those ofthe authors and are not necessarily the views of the agencies.
References
Caldwell, J. C., Woodruff, T. J., Morello-Frosch, R., and Axelrad, D. A. (1998),“Application of Health Information to Hazardous Air Pollutants Modeled in EPA'sCumulative Exposure Project”, Toxicology and Industrial Health, Vol. 14, No. 3, pp.429-454.
Carr, D. B. (1994), “Converting Tables to Plots”, Technical Report 101, Center forComputational Statistics, George Mason University, Fairfax, VA.
Carr, D. B., Olsen, A. R., Courbois, J. P., Pierson, S. M., and Carr, D. A. (1998), “LinkedMicromap Plots: Named and Described”, Statistical Computing and Statistical GraphicsNewsletter, Vol. 9, No. 1, pp. 24-32.
Carr, D. B., and Pierson, S. M. (1996), “Emphasizing Statistical Summaries and ShowingSpatial Context with Micromaps”, Statistical Computing and Statistical GraphicsNewsletter, Vol. 7, No. 3, pp. 16-23.
Carr, D. B., Valliant, R., and Rope, D. (1996), “Plot Interpretation and InformationWebs: A Time-Series Example from the Bureau of Labor Statistics”, StatisticalComputing and Statistical Graphics Newsletter, Vol. 7, No. 2, pp. 19-26.
Cleveland, W. S., and McGill, R. (1984), “Graphical Perception: Theory,Experimentation, and Application to the Development of Graphical Methods”, Journal ofthe American Statistical Association, Vol. 79, pp. 531-554.
Cleveland, W. S. (1993), Visualizing Data, Hobart Press, Summit, NJ.
Cleveland, W. S. (1994), The Elements of Graphing Data, Hobart Press, Summit, NJ.Rosenbaum, A. S., Axelrad, D. A., Woodruff, T. J., Wei, Y.-H., Ligocki, M. P., andCohen, J. P. (1999), “National Estimates of Outdoor Air Toxics Concentrations”, Journalof the Air & Waste Management Association, (In Press).
Symanzik, J., Wong, D., Wang, J., Carr, D. B., Woodruff, T., and Axelrad, D. (1999),“WWW-based Access and Visualization of Hazardous Air Pollutants”, Proceedings GISin Public Health Conference, August 17-20, 1998, San Diego, California (InPreparation).
Woodruff, T. J., Axelrad, D. A., Caldwell, J. C., Morello-Frosch, R., and Rosenbaum, A.S. (1998), “Public Health Implications of 1990 Air Toxics Concentrations Across theUnited States”, Environmental Health Perspectives, Vol. 106, No. 5, pp. 245-251.URL:CumulativeExposureProject http://www.epa.gov/CumulativeExposure/Figures
Figure 1: The starting page of the Cumulative Exposure Project Web page, accessible athttp://www.epa.gov/CumulativeExposure/.
Figure 2: Micromap display at the top (US) level for all 50 states and the District ofColumbia. The total modeled 1990 HAP concentration in micrograms per cubic meter forall 148 HAPs has been averaged over the number of census tracts in each state. The 25thand 75th percentiles have been displayed as well. For some states, e.g., Vermont, thedistribution of HAP concentrations is extremely skewed which becomes obvious whenthe displayed arithmetic mean falls above (or below) the 75th (or 25th) percentile. Theaccompanying second data column (which will not be available on the Web page)indicates the percentage of urban census tracts. A strong correlation between the twodisplayed variables can be visually detected.
Figure 3: Micromap display at the state level for Michigan for all of its 83 counties. Thetotal modeled 1990 HAP concentration in micrograms per cubic meter for all 148 HAPshas been averaged over the number of census tracts in each county. Counties are orderedfrom highest to lowest with respect to the total modeled 1990 HAP concentration.
Figure 4: Tabular display at the op (US) level for all 48 states and the District ofColumbia. The currently invisible states can be displayed by scrolling down the slider onthe right hand side of the table. Displayed is the modeled 1990 concentration for benzenein micrograms per cubic meter. Summary statistics include number of census tracts,mean, median, minimum, 25th percentile, 75h percentile, and maximum with respect tothe underlying census tracts for each state.
Figure 5: Tabular display at the state level for Rhode Island for all of its 5 counties.Displayed is the modeled 1990 concentration for benzene in micrograms per cubic meter.Summary statistics include number of census tracts, mean, median, minimum, 25thpercentile, 75h percentile, and maximum with respect to the underlying census tracts foreach county.
Figure 6: Tabular display at the county level for Rhode Island’s Bristol County for all ofits 12 census tracts. Displayed is the modeled 1990 concentration for benzene inmicrograms per cubic meter. In addition, a 90% confidence interval is displayed.
Figure 7: Raw data display at the county level for Rhode Island’s Bristol County for all ofits 12 census tracts. Displayed is the modeled 1990 concentration for benzene inmicrograms per cubic meter. In addition, a 90% confidence interval is displayed.
因篇幅问题不能全部显示,请点此查看更多更全内容