Entity

Time filter

Source Type

Minneapolis, MN, United States

Davern M.,University of Chicago | Blewett L.A.,University of Minnesota | Lee B.,Minnesota Population Center | Boudreaux M.,2221 University Ave | King M.L.,Minnesota Population Center
Epidemiologic Perspectives and Innovations | Year: 2012

The Integrated Health Interview Series (IHIS) is a public data repository that harmonizes four decades of the National Health Interview Survey (NHIS). The NHIS is the premier source of information on the health of the U.S. population. Since 1957 the survey has collected information on health behaviors, health conditions, and health care access. The long running time series of the NHIS is a powerful tool for health research. However, efforts to fully utilize its time span are obstructed by difficult documentation, unstable variable and coding definitions, and non-ignorable sample re-designs. To overcome these hurdles the IHIS, a freely available and web-accessible resource, provides harmonized NHIS data from 1969-2010. This paper describes the challenges of working with the NHIS and how the IHIS reduces such burdens. To demonstrate one potential use of the IHIS we examine utilization patterns in the U.S. from 1972-2008. © 2012 Davern et al; licensee BioMed Central Ltd. Source


Saporito S.,College of William and Mary | van Riper D.,Minnesota Population Center | Wakchaure A.,University of Southern California
URISA Journal | Year: 2013

The School Attendance Boundary Information System (SABINS) is a social science data infrastructure project that assembles, processes, and distributes spatial data delineating K through 12th grade attendance boundaries for thousands of school districts in the United States. Until now, attendance boundary data have not been made readily available on a massive basis and in an easy-to-use format. SABINS removes these barriers by linking spatial data delineating attendance boundaries with tabular data that describe the demographic characteristics of populations living within those boundaries. This paper explains why a comprehensive GIS database of K through 12 attendance boundaries is valuable, how original spatial information delineating attendance boundaries is collected from local agencies, and techniques for modeling and storing the data so they provide maximum flexibility to the user community. The goal of this paper is to share the techniques used to assemble the SABINS database so that federal, state, and local agencies can apply a standard set of procedures and models as they gather data for their regions. Source


Esteve A.,Center dEstudis Demografics | McCaa R.,Minnesota Population Center | Lopez L.A.,University of Costa Rica
Population Research and Policy Review | Year: 2013

The explosive expansion of non-marital cohabitation in Latin America since the 1970s has led to the narrowing of the gap in educational homogamy between married and cohabiting couples (what we call "homogamy gap") as shown by our analysis of 29 census samples encompassing eight countries: Argentina, Brazil, Chile, Colombia, Costa Rica, Ecuador, Mexico, and Panama (N = 2,295,160 young couples). Most research on the homogamy gap is limited to a single decade and a small group of developed countries (the United States, Canada, and Europe). We take a historical and cross-national perspective and expand the research to a range of developing countries, where since early colonial times, traditional forms of cohabitation among the poor, uneducated sectors of society have coexisted with marriage, although to widely varying degrees from country to country. In recent decades, cohabitation is emerging in all sectors of society. We find that among married couples, educational homogamy continues to be higher than for those who cohabit, but in recent decades, the difference has narrowed substantially in all countries. We argue that assortative mating between cohabiting and married couples tends to be similar when the contexts in which they are formed are also increasingly similar. © 2012 Springer Science+Business Media Dordrecht. Source


News Article
Site: http://www.nature.com/nature/current_issue/

Our historical weather data came from two sources: the Global Surface Summary of the Day (GSOD) data set, version 7 (https://data.noaa.gov/dataset/global-surface-summary-of-the-day-gsod; accessed 22 July 2014)21, and the US Historical Climatology Network (USHCN), version 2.5 (http://www.ncdc.noaa.gov/oa/climate/research/ushcn/; accessed 22 September 2015)22, both maintained by the National Centers for Environmental Information. Block group-level and county-level US Census data, including geographical boundary data, came from the Minnesota Population Center’s National Historical Geographic Information System, version 2.0 (https://data2.nhgis.org/main; accessed 30 July 2014)31. We obtained county-level monthly temperature projections from the National Climate Change Viewer (NCCV) (http://www.usgs.gov/climate_landuse/clu_rd/nccv.asp; accessed through direct communication with J. Alder on 20 July 2015 and 3 December 2015)26, 27, a US Geological Survey product that takes downscaled climate scenarios prepared by NASA (the National Aeronautics and Space Administration) and averages the 800-m gridded temperature data to the county level. Our international projections data came from the Royal Netherlands Meteorological Institute’s Climate Change Atlas (http://climexp.knmi.nl/plot_atlas_form.py; accessed 11 September 2015)32. We limited our analysis to data from weather stations in the contiguous United States that operated continuously between 1974 and 2013. This 40-year period is long enough to minimize sensitivity to natural variability in weather data, and it begins at a point in time when the number of weather stations included in standard data sets and the completeness of the data they reported both increased. The timespan covers the entire history of Americans’ exposure to the idea of climate change, allowing us to track how weather has shifted during the time when the public might have perceived such shifts as attributable to climate change. Daily weather data on temperature and humidity came from the GSOD data set, produced by the National Centers for Environmental Information from hourly weather station observations contained in the Integrated Surface Hourly data set21. Of the various land-based weather station data sets that offer daily summary data, GSOD is the only one that includes weather records necessary to measure a location’s relative humidity, which the urban economics literature has shown to be an important climate amenity driving regional population growth. Temperature and humidity records in our data set came from the GSOD’s daily station records of mean and maximum temperatures and mean dew point temperature (from which we calculated daily relative humidity and, in turn, daily heat index values). We included in the study only those GSOD stations reporting valid data on each of these weather indicators for at least 50% of the days in each of the 480 months of our study period, reducing the total number of stations in the analysis from 672 to n = 324 (Supplementary Table 1). Raising the threshold for valid data from 50% to 75% produced similar results (Extended Data Table 4a). Our final data set included a small number (n = 34) of stations that were relocated to nearby sites at some point between 1974 and 2013. In each case, the site location changed no more than 10 m in elevation and 0.1 decimal degree in latitude or longitude, and data reported by the relocated stations covered the entire study period with no more than a 15-day gap. Running our main analysis after omitting these relocated stations produced similar results (Extended Data Table 4b). Synoptic reporting of weather conditions in the GSOD data set introduces error in the calculation of daily precipitation indicators, so our main analyses employ daily precipitation records from weather stations in the USHCN, a designated subset of the National Oceanic and Atmospheric Association’s Cooperative Observer Program Network22. Sites are chosen for inclusion in the USHCN according to their spatial coverage, record length, data completeness, and historical stability. USHCN records are subject to rigorous quality control checks and have been demonstrated to be less error-prone than the GSOD33. We included in the study only those USHCN stations for which at least 90% of daily precipitation data were available in no fewer than 95% of the 480 months of our study period, reducing the total number of stations in the analysis from 1,218 to n = 601. This reduced the share of days in the USHCN data set with missing precipitation data to 1.2%. Because any time-dependent missing daily precipitation data could potentially affect our measurements of annual total precipitation and precipitation days, we used a procedure for simulating the occurrence of precipitation on missing data days that has been employed in leading research on over-time precipitation trends34, 35. All simulations were conducted at the station-by-month level. For any day with missing precipitation data, we first employed a random-number generator to simulate whether precipitation occurred by using the observed frequency of precipitation within the station-month over the 40-year period of our analysis. We fitted a separate gamma distribution—which has been shown to realistically represent precipitation processes—to each station’s daily precipitation by month (for a total of 601 × 12 = 7,212 distributions), using only months for which the station had complete data. A random draw from the fitted distribution was then used to simulate missing daily precipitation for any day in the station-month on which precipitation was simulated to occur. As a robustness check, we carried out the same analysis using GSOD precipitation data; results were similar (Extended Data Table 4c). To estimate population exposure to weather conditions, we used a method employed by health geographers that weights weather station observations based on their distance from population centroids of US counties36, 37. Unlike other geographic units we might use to measure population exposure, counties are the smallest unit of geography for which boundaries remained almost entirely unchanged during the 40-year period. We located the population-weighted centroid for each county using block group population and boundary data from the 1990 Census, which was conducted approximately at the midpoint of our study period. We then assigned weights to GSOD and USHCN weather stations located within 160 km of a county’s population centroid based on the inverse of the station’s squared distance from the centroid. Counties with no weather station within 160 km from their centroids (n = 66 of the 3,103 counties in the contiguous United States) were dropped from the analysis; the counties remaining accounted for 98% of the 2010 contiguous US population. The median number of GSOD weather stations assigned to counties was 7; the median number of USHCN stations was 13 (Supplementary Table 1). Extended Data Figure 1 shows a map of the weather stations and counties in our data set. Our findings are robust to other methods of matching weather conditions to the population. The results were similar when including only stations located within 80 km of population centroids (Extended Data Table 4d). In a separate analysis, we created a Voronoi polygon around each GSOD weather station with valid temperature, humidity and precipitation data (Extended Data Fig. 3) and then assigned population to the polygons by 1990 Census block groups. Using this method—which relies only on a single weather station’s data for each block group and does not include USHCN data—we find results similar to those in our main analyses (Extended Data Table 5). We used daily data to calculate monthly averages by weather station for each of the weather indicators in our data set, yielding data at the station × year × month level, and then calculated annual values of January average daily maximum temperature, amount of precipitation, and number of days on which precipitation occurred. We used standard formulas to calculate July average daily mean relative humidity38 and July daily heat index39. Because we were interested in Americans’ experience with the weather rather than distinguishing between short-term natural variability and long-term climate trends, we did not adjust the data to remove urban heat island effects. We also did not adjust for changes in instrumentation or observation routine. Research on the effects of these changes on temperature measurements suggests that the effects are modest and should bias results against our findings. The transition from afternoon to morning temperature observations and the adoption of electronic instruments both had the effect of recording lower maximum temperatures4, 40, 41. These effects do not seem to vary between winter and summer41. Because warming that has occurred over the last 40 years has been more pronounced and widespread in winter than in summer, instrument changes would result in understating the amount of January warming that has occurred and the corresponding increase in WPI. The effects of instrument changes on dewpoint temperature, and thus relative humidity measurements, are less systematic over time, but they have been detected at only a small proportion of stations24. Considering the modest role that relative humidity plays in our preference model and the limited evidence that instrumentation substantially affects measurements, we determined that data adjustments were not necessary. We assigned weather station data to counties using the inverse-distance weights described earlier. Because our interest was in population exposure, we weighted our annual indicators by 2010 county population42. We used constant population weights—rather than adjusting them for shifts over time in county population—to isolate the impact of weather trends from any changes in aggregate exposure that are attributable to population migration or growth. As shown in Table 1, using population weights from 1970 rather than 2010 or unweighted values produced similar results. Summary statistics for the county-level weather indicators over our 40-year study period are reported in Extended Data Table 1; mean values of WPI by county are shown in Fig. 2a. To produce the results reported in the paper, we used a WPI derived from a population growth model reported in a widely cited study14 (reported in ref. 14 table 3, model 6). The model includes both linear and quadratic weather terms to flexibly assess preferences about weather. In the model used here, county population growth from 1970 to 2000 is a function of five long-term normal weather indicators: January average daily maximum temperature (JAN_MAX); July daily heat index (JULY_HI); July average daily mean relative humidity (JULY_RH); annual precipitation (PRECIP_IN); and the number of days on which precipitation occurs annually (PRECIP_DAYS). Control variables include county geographical, coastline and topological features, baseline population density and total population, and baseline shares of county population employed in different sectors, including those tied closely to weather such as agriculture and transportation. Taking the reported coefficients estimating the partial relationships between the weather indicators included in the model and population growth, we calculated a WPI score for each county j in each year t using All weather indicators are centred at their means, and thus the linear term coefficients can be interpreted as the effect of a one-unit shift in the indicator on WPI at the indicator’s mean value. As shown in equation (1), the analyses in this study (and all published studies from which we derive WPIs) were conducted using US conventional (imperial) units of measure. To comport with these studies, we employed imperial units in our calculations of all WPIs and then transformed results into SI units to report temperature and precipitation trends. We checked the robustness of our finding by calculating alternative WPIs based on five other published analyses9, 11, 12, 13 estimating the effect of climate amenities on local population growth. These studies employ simpler treatments of climate amenities, in some cases including only two or three indicators related to temperature, precipitation or humidity. All treat summer and winter temperatures separately. For each study, we developed a WPI based on reported coefficients on the study’s weather-related variables (Extended Data Tables 2 and 3). We then used our county-level weather data to calculate annual WPI scores for all US counties from each of these WPI formulas. We were able to measure all weather-related variables at the county level over the entire 40 years except for sunshine hours, a variable that appears in two of the models12, 13. Our estimates therefore assume no long-term change in the amount of sunshine experienced by individual counties. County-level temperature estimates for the RCP4.5 and RCP8.5 emissions scenarios came from the NCCV, which uses NASA Earth Exchange Downscaled Climate Projections (NEX-DCP30) data to project future changes in climate and water balance for states, counties and hydrologic units26, 27. The NEX-DCP30 data set statistically downscales projections from 33 models included in the 5th Climate Model Intercomparison Program (CMIP5) to an 800-m grid. The NCCV includes 30 of the 33 models that cover both emissions scenarios and creates area-weighted averages at the county level. Consistent with the NCCV’s presentation of these data, we have examined projections over three time periods—2025–2049, 2050–2074, and 2075–2099—under the two emissions scenarios, comparing mean WPI values within each time period to the observed 1974–2013 means for every county (Extended Data Table 6). Because of data availability, we estimated changes in WPI based only on changes in summer and winter temperatures, weighting counties by their 2010 populations and fixing other weather indicators at their means for the final 10 years of our study period. Consistent with studies of CMIP5 model performance, we found discrepancies between the observed temperature record and hindcasts yielded by climate models7, 43, with the effect that simulated temperature data for the 40-year historical period of our study produce average WPI scores that are lower than those calculated using observed data. Recognizing that discrepancy between modelled and observed temperatures may persist into the future, we performed an additional analysis in which we regression-adjusted projected future WPI scores under all time frames and scenarios to account for the discrepancy. The adjustment was performed by regressing observed annual WPI on simulated WPI derived from the CMIP5 models for the 1974–2013 period, yielding We adjusted the projections in Fig. 3a using predictions from this model and display the adjusted projections in Fig. 3b. We obtained these projections using the Royal Netherlands Meteorological Institute’s Climate Change Atlas, which provides CMIP5 climate model output for a variety of countries, seasons, time periods and scenarios through its web-based interface32. For each country, we obtained the mean surface-averaged projections of change in maximum winter and summer near-surface temperatures in the 2075–2099 period under RCP4.5 and RCP8.5 with respect to the reported mean of those observed in the 1974–2013 period (Extended Data Table 7).


McCaa R.,Minnesota Population Center | Ruggles S.,Minnesota Population Center | Sobek M.,Minnesota Population Center
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2010

In the last decade, a revolution has occurred in access to census microdata for social and behavioral research. More than 325 million person records (55 countries, 159 samples) representing two-thirds of the world's population are now readily available to bona fide researchers from the IPUMS-International website: www.ipums.org/international hosted by the Minnesota Population Center. Confidentialized extracts are disseminated on a restricted access basis at no cost to bona fide researchers. Over the next five years, from the microdata already entrusted by National Statistical Office-owners, the database will encompass more than 80 percent of the world's population (85 countries, ~100 additional datasets) with priority given to samples from the 2010 round of censuses. A profile of the most frequently used samples and variables is described from 64,248 requests for microdata extracts. The development of privacy protection standards by National Statistical Offices, international organizations and academic experts is fundamental to eliciting world-wide cooperation and, thus, to the success of the IPUMS initiative. This paper summarizes the legal, administrative and technical underpinnings of the project, including statistical disclosure controls, as well as the conclusions of a lengthy on-site review by the former Australian Statistician, Mr. Dennis Trewin. © 2010 Springer-Verlag Berlin Heidelberg. Source

Discover hidden collaborations