bd CPS version 0.3 released

Version 0.3 of my notebooks for cleaning up and working with Current Population Survey public use microdata is available on GitHub. Several new variables were added, much of the code was refactored for speed, and several bugs were fixed. The new version makes use of Census revised weights for 2000-2002 and December 2007, revised data on union membership and coverage in 2001 and 2002, and data on professional certification for 2015 and 2016. There is also a new notebook for creating extracts for 1989-93 from microdata hosted by NBER.

I’m looking into how to (for free or very cheaply) host the actual data files from this project, since they would definitely be useful to people who know python and want to work with CPS data. Each annual file is about 30mb after compression. Any suggestions are welcome.

As always, please contact me ( if you find any errors or have any questions.

Six southern US metro areas: part 3 – education and school enrollment

Part three in the series on mid-sized metro areas around the southern portion of the Appalachian mountains looks at education and school enrollment, for men and women, and how it compares to the US as a whole.

The six areas of interest are: the Chattanooga-Cleveland-Dalton, TN-GA combined statistical area, the Greenville-Anderson-Spartanburg, SC combined statistical area, the Asheville, NC center-based statistical area, the Johnson City-Kingsport-Bristol, TN-VA combined statistical area, the Huntsville, AL center-based statistical area, and the Knoxville, TN center-based statistical area. See the first post in the series for more background.

The source for these results is 24 months of aggregated CPS microdata, covering January 2017 to December 2018.

Highest level of education attained

When defining education levels for adults, it customary to identify the highest level of education someone has attained based on five categories: 1) people without a high school degree, 2) those with a high school degree or GED but no college, 3) those with some college but no degree or a two-year degree, like an associate degree, 4) people with a bachelor’s degree, and 5) people with an advanced degree like a master’s degree, law or medical degree, or PhD.

I’ve used this grouping to calculate the educational distribution for men and women, age 25-54, in each area and in the US as a whole. Much like previous results in the series, there is an interesting divergence between areas. There is also an interesting divergence between men and women within areas.


Overall, people in the 25-54 age group in Huntsville are the most likely to have an advanced degree. However, the result is much stronger for men (20.3%) than for women (14.3%). Women in Huntsville are no more likely to have an advanced degree than women in the US as a whole. Other than Huntsville, none of the six areas has an above-US-average likelihood of having an advanced degree. It is also interesting to observe that Huntsville was the only area of the six where 25-54 year old women are less likely to have a high school degree than men. Huntsville’s share of age 25-54 men without a high school degree is nearly half the nationwide average.

The Asheville area has the largest gap between men and women in educational attainment. In Asheville, 42.1% of women age 25-54 have a bachelor’s degree or more, compared to only 26.7% of men. Men in Asheville, like those in Knoxville, Greenville, and especially Chattanooga, are less likely to have a high school degree than men in the US as a whole. However, in contrast to Asheville, in the Chattanooga-Cleveland-Dalton area, the share of men (28.8%) and women (28.5%) with a bachelor’s degree or more is almost identical.

The educational distribution for 25-54 year olds is fairly similar between the Johnson City-Kingsport-Bristol, Knoxville, and Greenville-Anderson-Spartanburg areas, with two exceptions. First, in the Johnson City-Kingsport-Bristol area, men are far more likely to have a high school degree compared to men in the other areas. Second, in Knoxville, like Huntsville, men are more likely than women to have an advanced degree.

School enrollment among young people

School enrollment among people age 18 to 24 in the six areas varies greatly between the six areas. In Huntsville, more than half (55.5%) of men in the age group are enrolled in school (college, university, or high school). Huntsville is the only one of the six areas where young men are more likely to be in school than young women, however, school enrollment is still higher for young women in the area than for young women nationwide.

School enrollment rates in the Chattanooga, Greenville, and Knoxville areas are similar to the US-wide average. Among 18-24 year olds in the Johnson City-Kingsport-Bristol area, both men (34.9%) and women (37.8%) are far less likely to be enrolled in school than those in the US as a whole. Among women age 18-24 in the six areas, those in Johnson City-Kingsport-Bristol were the least likely to be enrolled in school.

Men in Asheville stand out in the school enrollment data, with only 21.9% enrolled during ages 18-24, compared to 38.9% for women in the area.


To look at school enrollment for a narrow age group (those 18-24), I used 4 years of aggregate CPS microdata (January 2015 to December 2018). However, there were only 138 valid observations for men age 18-24 in Asheville (by population the smallest area of the six). To check that the result from the four combined years of data is meaningful, I applied the same calculation to each of the four individual years of data. The results were pretty consistent in each year.

It’s worth noting that the school enrollment variable is derived from a household survey and asks whether anyone in the household was enrolled in school in the previous week. This is an important detail for several reasons. First, young people living in a dorm will only be included if their dorm room is part of the survey (not if their parent’s household is in the survey). Second, some of the areas in this survey have large colleges and universities where people from all over the world are locally enrolled in school and can therefore be part of the survey. Third, the data are from monthly files, so those who are in school for eight months of the year would answer “no” during any of the four months that they are not in school.

The jupyter notebook used in this analysis is here. The next blog post in the series will look at what share of people in each area are working, unemployed, or not in the labor force, compared to the nation as a whole.

Six southern US metro areas: part 2 – race, ethnicity, and country of origin

Today, I continue my look at six mid-sized cities around the southern portion of the Appalachian mountains. This post examines how the racial, ethnic, and national backgrounds of people in the area differ from the US as a whole. The results surprised me.

As a reminder (see yesterday’s post for more background), the six areas of interest are: the Chattanooga-Cleveland-Dalton, TN-GA combined statistical area, the Greenville-Anderson-Spartanburg, SC combined statistical area, the Asheville, NC center-based statistical area, the Johnson City-Kingsport-Bristol, TN-VA combined statistical area, the Huntsville, AL center-based statistical area, and the Knoxville, TN center-based statistical area.

As in the previous post, the source for these results is 24 months of aggregated CPS microdata, covering January 2017 to December 2018.

Race and ethnicity

The first section compares the racial and ethnic makeup of each area to the national average. The racial and ethnic categories are defined in such as way as to not overlap and to cover the entire population: white only (non-Hispanic), black only (non-Hispanic), Asian only (non-Hispanic), Native American only (non-Hispanic), more than one race (non-Hispanic), and Hispanic (any race).

The black share of the population varies greatly by city, with Huntsville (19.7%) and Greenville (18.9%) well above the US average of 12.3%. The black share of the Chattanooga area is similar to that of the US as a whole. In contrast, the black share of the population in Asheville (9%), Knoxville (6%), and Johnson City-Kingsport-Bristol (2.5%) is far below the national average. People in these areas are much more likely to be white than in other parts of the US, and particularly, in other parts of the south.


The Hispanic share of the population is more consistent across the six areas but is far below the nationwide average. In the US as a whole, 18.3% of the population is of Hispanic origin. Only 5% of the population of the six areas is Hispanic, with the largest Hispanic share of the population in the Greenville area (6.5%). The Johnson City-Kingsport-Bristol area has the lowest Hispanic share of the population (3.1%).


The Asian share of the population in these six areas (2%) is also substantially below the nationwide average (6.1%), however, the Asian share of the population in the entire south region (3.9%), is also below the national average. The Asian share of the population in the Huntsville area (3.5%) is the highest among the six areas. Chattanooga (1.5%) and Asheville (1%) have the lowest Asian share of the population.


Changing concepts slightly, the share of children (under age 16) that are more than one race (and not of Hispanic origin) provides additional insight into each of these six areas. In this category the Greenville (5.9%) and Chattanooga (5.2%) areas are above the national average (4.2%). In contrast, the share of children with more than one race is particularly low in the Asheville area (0.8%).


Country of birth

Interestingly, the foreign born share of the population in these six areas (5.5%) is far below the national average (13.7%) and the average for the south region (12.7%). None of the six areas have even half the foreign born share of population in the US as a whole. Greenville has the largest share of its population born outside the US (6.5%), and Chattanooga (4%) has the lowest.


Finally, I combined four years of microdata to get a sufficient sample for identifying individual countries of birth in each of the six areas. Even though people in the six areas are very likely to be born in the US, data suggests that there are significant communities of people born in certain countries in five of the six areas, relative to the overall US as a whole.

In Chattanooga, the Guatemalan born population is above the US-wide average. In Greenville, there is an above average Russian-born population. In Asheville, people are more likely to be born in Canada and the Philippines. Huntsville has a German- and Philippines-born population that exceeds the national average. Lastly, people in the Knoxville area are disproportionately likely to have be born in Sudan and Turkey.

The Jupyter notebook used for this analysis is here.

The next blog post will look at education levels and school enrollment.

Six southern US metro areas: part 1 – age and family structure

While it’s snowing here in DC, frigid in the northern US, and Florida is full of snowbirds,  there exists a theoretical climate happy medium in the southern Appalachian region of the US. This magical area gets four seasons, has mountains nearby for hiking and clearing out unproductive thought patterns, and yet doesn’t get super cold. But before I decide to move to this region, I should probably know more about it. Fortunately, the Current Population Survey (CPS) can help.


What follows is the first in a series of blog posts about six mid-sized metro areas in the region that surrounds the southern portion of the Appalachian mountains. The six areas are: the Chattanooga-Cleveland-Dalton, TN-GA combined statistical area, the Greenville-Anderson-Spartanburg, SC combined statistical area, the Asheville, NC center-based statistical area, the Johnson City-Kingsport-Bristol, TN-VA combined statistical area, the Huntsville, AL center-based statistical area, and the Knoxville, TN center-based statistical area.


These mid-sized areas are likely influenced by three major cities nearby: Charlotte, Nashville, and Atlanta, however, I’m going to focus only on the mid-sized cities, which perhaps get less analytical attention.

The first post in the series will cover population, age composition, and family structure for people age 22 to 32. Specifically, I’ll look at whether people in the age 22-32 group are married and whether they have kids. Future blog posts will cover education, industry composition and occupation composition, labor market status (employed, unemployed, why not in the labor force), hours worked and wages, and finally, union membership and professional certification.

To get a sufficient sample size, data listed in the post today are drawn from 24 months of aggregated CPS microdata, covering 2017 and 2018. The wage discussion will likely use three or four years of data (since wage questions are asked to 1/4 of the CPS sample).


CPS-based-estimates of population for the six areas are as follows:

  • Chattanooga-Cleveland-Dalton, TN-GA: 802,000
  • Greenville-Anderson-Spartanburg, SC: 1,270,000
  • Asheville, NC: 463,000
  • Johnson City-Kingsport-Bristol, TN-VA: 505,000
  • Huntsville, AL: 478,000
  • Knoxville, TN: 853,000

Age composition of population

My first question is what share of people living in these areas are children (under 16) and what share are retirement age (over 64).

The age 15 or less share of the population in the six areas is at or below the US average. The Greenville area has the highest share of children (20.7% of the population) and is the only area with an above-average share of children. The Asheville area has the lowest (15.1%).


The age 65 or older share of the population varies between the six areas, with four of the six areas having an above average retirement-age share. The highest age 65 or older share of the population is in Asheville (22.1%) and the lowest is in Huntsville (11.3%).


Marriage rates among those age 22-32

Next, I’m curious about family structure among those age 22-32. Student debt and expensive housing, among other things, have the result of reducing marriage rates for young people. I’m curious how young people’s marriage rates compare between these six areas and the nation as a whole.

Five of the six areas (Asheville is the exception again) have above national average rates of marriage for those age 22-32. The highest age 22-32 marriage rate is in the Johnson City-Kingsport-Bristol area, where 40.8% of the age group is married. In Asheville, 31% of the age group is married, just under the 32% nationwide rate. Marriage rates are also well-above average in Knoxville (39.7%) and in the Chattanooga area (39.4%).

My suspicion here is that student debt levels and housing prices are below national averages for much of this region, which makes starting a new household easier.


Share of 22-32 year olds with kids

Finally, I want to look at how many 22-32 year olds have kids in each of the areas, compared to the US as a whole.

The share of 22-32 year olds with kids varies among the six areas. Like marriage rates, the Johnson City-Kingsport-Bristol has the highest share of 22-32 year olds with at least one kid, at 41.9%, compared to 29.3% nationwide. Knoxville (34.7%) and the Chattanooga area (34.1%) also have above average parenthood rates. Huntsville (22.9%) and Asheville (24.7%) have a low rate of parenthood among 22-32 year olds.


The share of 22-32 year olds with two or more kids is above the national average for five of the six areas. Chattanooga (20.1%), Johnson City-Kingsport-Bristol (20%) and the Greenville area (19.3%) are well above the national average of 16.2%. Interestingly, Huntsville is far below the national average, with 8.7% of the age group with two or more kids. The Huntsville area has a large military population, but, importantly, people in the Armed Forces are already excluded from this dataset, so that can’t be the explanation.


The next blog post in the series will look at education and industries and occupations.

The jupyter notebook used in this analysis is available here.

Do regional inflation differences affect nationwide real wage estimates?

Some basic facts about the US economy point to the possibility that inflation is overstated in measures of real wage growth for workers at the very bottom of the wage distribution. If this is true, it would mean that low-wage earners have actually seen more wage growth than published estimates suggest.

First, the places that have a higher state or local minimum wage than the federal minimum wage tend to be in the west region of the US, and sometimes in the northeast or midwest regions, but rarely in the south. The regional differences in minimum wage create large differences in the regional distribution of low wage earners. In November 2018, the population (including children) is divided by region as follows: midwest: 20.8%, northeast: 17.6%, south: 37.8%, and west: 23.8%. In contrast, people earning $8.00 per hour or less are distributed as follows: midwest: 19.2%, northeast: 16.3%, south: 51.3%, and west: 13.2%. In other words, low wage earners are disproportionately in the south and disproportionately not in the west.

Second, in recent years inflation has been higher in the west region, compared to the south or midwest regions. This is largely because inflation has recently been driven by housing shortages, which are particularly severe in the west region. Price growth from December 2017 to December 2018 was 1.9% nationwide, 1.3% in the midwest, 1.7% in the northeast, 1.5% in the south, and 3.1% in the west.

The above creates a potential issue for estimates of real wage growth for low wage earners. This is because nearly all such estimates apply the nationwide rate of inflation (for all urban consumers) to workers who disproportionately live in the south, which has less inflation than the nation as a whole. Likewise, low-wage workers are not nearly as likely to live in the west, which has higher inflation than the nation as a whole.

To test this, I used CPS microdata to calculate the real wage for the fifth percentile wage earner (nationwide) using both the usual CPI-U (the CPI-U-RS is preferred, but that would have complicated analysis) and using the regional CPI for each of the four regions. That is, in the regional CPI estimate, each wage observation is adjusted using the price index for the region where the person lives.


The results show that the story above does apply, but that the effects are very minimal and not particularly cumulative. The total cumulative difference since 1989 is less than three percent. In other words, it’s true that low wage earners did better than published estimates suggest, but the effect is not very big. It’s important to point out that adjusting for age, or including only full-time workers, eliminates most of the cumulative impact. Usually findings this trivial are not written up as a blog post, but they might be of interest to people who have the same question. Here’s the jupyter notebook used to calculate the results.

Comments and feedback are always welcome. Please let me know if I did something wrong, so I can learn!

Higher employment rates mean higher wages for low-wage workers

Low-wage jobs pay better when the local labor market is tight. This is because an employer who can’t easily replace one worker with another will tend to pay current workers more to incentivize them to stay and also will tend to invest more in equipment and training that make workers more productive and grows the economy.

One way to see this relationship is to look at what percent of people living in an area have a job and the wages near the bottom of the area’s wage distribution. For example, for workers between the ages of 25 and 54 in the ~100 largest metro areas, a one percentage point increase in the share of the age group with a job results in $0.13 an hour in additional wages for the first decile full-time wage earner in the metro area, equivalent to more than $200 extra per worker per year (in November 2018 dollars).

Employment rates (the share of the age group with a job) have been rapidly increasing since 2012 and show no sign of slowing (and perhaps even show signs of accelerating, as higher wages pull more people off the sidelines). If the trend is allowed to continue until the employment rate returns to its late 1990s rate, the employment-rate-related real wage boost for low-wage full-time workers would be between $500 and $600 a year.


Data source

This relationship has been pointed out countless times, for example by Dean Baker, but it worth reiterating, using the latest data. I’ve calculated these figures from the Current Population Survey public use microdata, using the latest two years of monthly data (December 2016-November 2018). The results are stored here in csv file, for those curious about which dot is which metro area. I’ve also added the union membership rate and the unemployment rate for the area to the file. Wages are in November 2018 dollars, adjusted for inflation within the 2-year period by the regional CPI-U. The largest metro areas are the 97 center-based statistical areas (CBSAs) with at least 300 valid wage observations during the 2-year period. The python code that generates the results is available here. I use a set of programs called the bd CPS to standardize the CPS data from 1994 to present, which are available as a GitHub repo.


Possible pieces of the long-term unemployment puzzle

Top US economists recently shared insights and data on the increase in the share of unemployed who are long-term unemployed. While unemployment is currently low, about one-fifth of those who are without a job and looking for one have been doing so for more than six months. As Martha Gimbel pointed out, in “1969 (the last time the unemployment rate was at 3.7%), [long-term unemployed] were 4.7% of the unemployed.” In other words, long-term unemployed used to be much more rare during periods of low unemployment.

Abigail Wozniak added that “more [long-term unemployment] is a clear trend,” citing research done for the CEA that gives insight into the trend. It’s very hard to argue, in light of this data, that something structural is not happening here.

I’m hoping to add a few points to the discussion, the first of which I have not seen made. To do so, I focus on the period from 1998 to present, which has a more stable female labor force participation rate than the period from 1969 to 1997. In 1998, the long-term share of unemployed was 14.1 percent, compared to 21.6 percent in the 12 months ending November 2018. The source for the following calculations and charts is basic CPS microdata.

Educational specialization

In 1998, the highest level of education for 46 percent of the US labor force was a high school degree or less, while about 25 percent had a bachelor’s degree or more (the rest had either some college but no degree, or an associate degree). In contrast, the most recent twelve months of data show 35 percent of the workforce with a high school degree or less and 37 percent with a bachelor’s degree or more. One result of this long-term structural change, almost by definition, is increased specialization in the educational training of the workforce.

Educational specialization may mean less fungibility in employment. To illustrate this point in a highly exaggerated way, it may take an unemployed person with a BA in Mesopotamian history longer to find a job in the field than it does for someone with a GED who quit at Wendy’s. This explanation may be particularly relevant during a period with a tight labor market and a growing economy, when an unemployed person with advanced education is less desperate to take the first available job and when job opportunities are much more plentiful for those with less education.

We can see this point by looking at the long-term share of unemployed by their highest level of educational attainment (figure 1). Among unemployed with a bachelor’s degree, 23.8 percent are long-term unemployed (27 weeks or more), compared to 18.7 percent for those without a high school degree.

Figure 1.


So there are two things going on here. First, the advanced-education share of the overall labor force has increased. Second, more-educated unemployed people are more likely to be long-term unemployed. The effect of combining these two is a higher long-term share of all unemployment.

This first point is further substantiated because people with advanced education have become relatively more likely to be unemployed. During 1998, the unemployment rate for college grads was 2.0 percent, compared to 2.5 percent over the past year, and the unemployment rate for those without a high school degree was 10.5 percent, compared to 8.1 percent over the past year.

Age and unemployment duration

But education is only a small part of the story. This is obvious because the long-term unemployed share of every educational group has increased over the same 20-year period. A second explanation, which was covered by Gray Kimbrough in the original twitter discussion, is the age composition of the workforce. The same “specialization” story may apply to age.

If we look at the long-term share of unemployed by age, we see that older unemployed people are more likely to be long-term unemployed than younger unemployed people (figure 2). Among unemployed people age 21 to 25, for example, 16.7 percent have been unemployed for 27 weeks or longer, compared to 30.6 percent for unemployed people age 56 to 60.

Figure 2.


Again, like the education story, the composition of the labor force is shifting towards those more likely to be long-term unemployed. In 1998, people age 56 to 60 made up 5.5 percent of the labor force, compared to 9.4 percent today. The age 21-25 share of the labor force has been relatively stable over the period, declining slightly from 10.1 percent in 1998 to 10.0 percent over the last year. Likewise, unemployment rates by age further support the argument. In 1998, the unemployment rate for those age 56 to 60 was 2.6 percent, compared to 2.9 percent today. The 1998 unemployment rate for those age 21 to 25 was 7.0 percent, compared to 6.2 percent today.

Beyond the story of older workers being relatively more specialized, part of the relationship between age and long-term unemployment may be coming from older workers being pushed out of jobs. For more on this topic, please read the fascinating deep dive by ProPublica.

Estimating the compositional effect from age and education

While evidence suggests that more education and older workers explain some of the increase in the long-term share of unemployment, I’m not convinced it can fully explain the 7.5 percentage point increase in the measure over the last 20 years. Re-weighting the last 12 months of CPS labor force observations to match the education and age composition of the 1998 labor force shows that 2.2 percentage points of the 7.5 percentage point increase in long-term unemployment can be explained by age and education. The majority of the increase is therefore not explained by age and education.

One business cycle with two troughs

As a last thought on the issue, measures such as the prime-age employment-to-population ratio show that the US labor market is still recovering from the recession of 2001. A long, jobless recovery creates a situation where people who might otherwise be in the labor force and short-term unemployed are instead in other categories, such as caring for elder relatives, disabled, or in school. If this is the case, it would increase the long-term share of unemployment over the period since 2001.

An alternative way to look at long-term unemployment is to consider the share of unemployed people (age 25 to 54 shown) who are employed one year later. The trend for this indicator (figure 3) suggests that if the US economy is allowed to run hot for a few more years, it may reverse the portion of long-term unemployment effects that are cyclical and not structural.

Figure 3.


I hope the above helps to clarify or guide some thinking on the important topic, even if it can only explain one-third of the puzzle.

Student debt explains why wage growth hasn’t caused inflation

The Fed has been raising interest rates because of concern that rising wages will lead to inflation. The problem with this argument is that while wages are rising, inflation, which is about 2%, is coming from higher world energy prices and from housing shortages, not from broadly higher demand for goods or services. Importantly, household survey data offers the following explanation for how wages could be going up without causing inflation: household investment in education is the cause of higher wages. Data show that, from a macro perspective, the only people who got a raise are the ones with more education than their peers in previous decades. Importantly, these are the same people who can’t spend as much as their peers from previous decades because they borrowed to pay for the education.

Perhaps the easiest way to see the relationship between education and higher wages is to compare growth in the overall median real wage to growth in the median real wage for workers in three educational subgroups (figure 1). From 2000 to October 2018 median real weekly wages increased by 3.6 percent overall, but fell by more than 1.5 percent in each educational subgroup. The median worker with a bachelor’s degree or more earns 2.4 percent less in the latest data than the group’s median worker did in 2000. For the group with a high school degree or less the median wage fell by 1.6 percent. For workers with some college but no degree or with an associate degree, the median wage fell by a whopping 7.7 percent.


Since the three educational subgroups make up 100% of the total, the logical explanation for this outcome is an upward shift in the educational composition of the workforce. That is, workers have much more education in 2018 than they did in 2000. Since people with more education tend to earn more money, the result of a more educated workforce is a higher paid workforce.

While people are generally aware that education levels have risen in the US, they may be surprised to see the extent to which this is true over the past 20 years, or how rapidly the trend accelerated in response to the great recession. In 2000, 40 percent of the full-time age 25-54 workforce had a high school degree or less, compared to 30.7 percent in the year ending October 2018 (figure 2a). Likewise, in 2000, 31.5 percent of the workforce had a bachelor’s degree or more, compared to 43 percent in 2018. The some college or associate degree group remained relatively stable in size, but shifted in it’s wage distribution towards lower wage jobs.

By comparing how wages are distributed among those in each educational group between 2000 and the latest year of data, we can see that the jobs added in each educational group tend to be lower wage (not entirely, but disproportionately), while the jobs lost by each group tend to be higher wage (figure 2b). This suggests that education levels have risen faster than the corresponding availability of good new jobs.


One interpretation of what may be driving these results is that households, in general, were aware that real wages were falling during the relatively weak labor market from 2001 to present. Some households responded to the weak set of job opportunities by investing in education. These households generally received higher wages if they were able to get a degree and then translate the credential into a better job.

The educational investment was very expensive and so those higher wages are not likely to translate into more spending in the same way that wage growth from higher productivity and a tight labor market would. The families that were able to make this investment have income that is less disposable because they now have student loan debt.

To clarify this point, imagine that the labor market had remained tight from 2001 to present and that we never had massive outsourcing of jobs or the great recession. Workers in this case could demand higher wages, and get them, without spending a fortune on education.

When workers’ wage increases do not come with debt attached, the new wage money is much easier to spend on additional goods and services, which could drive up prices. However, if wage increases come with debt attached, as they do in this recent case, then the relationship between higher wages and higher prices breaks down. As a result, the Fed really should worry less about a theoretical wage-price spiral and instead focus on the possibility of achieving full employment before the next recession.

Economist as a plumber with a body camera (EPBC)

Economists could do more to show their work and to maintain their results.

Economists aren’t particularly transparent in their day-to-day activities, but what they do is very important. Because economics offers little possibility for laboratory-style experiments or hard-science precision, there is a perverse opportunity for conclusions to be reached before data are collected. This is dangerous and would be less likely if economists 1) reported what they do more frequently and more clearly, and 2) shifted some of their responsibilities toward maintenance of existing policy or past results.

Esther Duflo described the potential for a shift in the role economists play in society. Rather than being biased architects designing massive social policies, economists can be the people who are responsible for maintaining social systems and keeping them running smoothly from month-to-month, without any gaps or leaks. Economists can be more like plumbers.

The worst possible way to implement this shift in responsibility would be to hire a separate group of economists to fiddle with policy or day-trade things like the stock of unemployed. Instead, a practical way to do this is for existing (and particularly newly-minted) economists to “wear body cameras”, reporting more of their day-to-day work on personal or work websites or blogs, or on twitter, or GitHub.

Since poverty is deadly and bad policy causes people suffering and death, economists, and others involved in crafting public policy, yield a lethal weapon. If they were in-effect “recorded”, the way these well-paid individuals practice their trade would likely change–both in what results are presented and also in what tasks are undertaken. And if it turns out that economists are already completely honest and unbiased, then “body cameras” would massively boost the field’s credibility.

There are now many free, open-source, and well-documented tools, like python and R, for working with public economic data and contributing to analysis. Plus, there is enough space on the cloud to share other iterations that aren’t presented in a final set of regression coefficients, for example. It would also be helpful for more economists to share the code that produces their results, so that others can extend or modify it. Increasingly, technology is making it possible for economists to show more of their work.

Economists as plumbers with body cameras, in practice, also means following up very frequently on past work and on how existing systems are performing for all people (not just the aggregated/synthetic statistical “person”). For example, the economists who justify liberalization of US trade policy could be the ones responsible for resolving the local effects from the related factory closures. Economists pushing cuts to social services in response to debt levels could report monthly on what their policy does to both the debt level and the poverty rate. In essence, economists could do more checking-up on what they’ve done in the past and, if necessary, clean up after themselves.

The EPBC therefore has two goals: 1) encouraging the showing of how results were obtained, and 2) more frequently revisiting past results. To contribute to achieving this, I’m publishing a series of jupyter notebooks that show my recent attempts at working with public economic data using python. For future projects and blog posts, I’ll link to the new notebooks, and use the tag EPBC (for economists as plumber with body camera). I’m not sure if this will work out, or be sustainable, but its an interesting idea and worth a try.

Here are the first three EPBC notebooks (python 3.6):

Is Amazon killing Walmart?

US exports by trading partner

Efficiently reading fixed-width files like the CPS public use microdata