MIRACLE

South Korea's post-war transformation is among the most remarkable growth miracles of the twentieth century. Yet rigorous empirical research on how it unfolded has been held back by data that is fragmented across provincial archives, recorded in mixed Korean and classical Chinese scripts, and scrambled by repeated boundary changes. MIRACLE is a multi-year effort to assemble the first consistent township-year economic panel for this era. In its first phase, the project is collecting and digitising ~2 million pages of municipal statistical yearbooks—published annually by county governments but never systematically compiled—into a public repository with time-consistent administrative boundaries. It is designed to lower the barriers to empirical research on a defining development episode.

~3,500 townships
30 annual panels
100+ variables
~2M pages of archives
3+ archival source types
bookangseol/miracle-korea
Approach

MIRACLE integrates multiple archival source types into a single framework built around time-consistent geographic identifiers. Each township carries a miracle_id that tracks it through South Korea's two major boundary reorganisations (1963, 1973) and dozens of smaller changes, allowing researchers to follow the same unit across three decades without manually reconciling administrative maps. The municipal statistical yearbooks form the natural backbone; they provide the richest and most consistent subnational coverage for this period. Additional archival layers, from forest type maps to foreign loan records and New Village Movement documents, extend the panel into domains the yearbooks do not reach.

MIRACLE starts with South Korea's municipal statistical yearbooks, but the ambition extends in two directions. First, within Korea, we plan to incorporate additional administrative sources — expressway construction logs, agricultural extension records, Korea Forest Service archives, colonial-era household registries, and local personnel files — to deepen the panel and enable research designs that link infrastructure, agricultural modernisation, and environmental policy to local institutional conditions.

Second, across countries, the infrastructure we build is designed to accommodate other growth miracle economies with comparable subnational statistical traditions. If similar municipal records exist for Taiwan, or district-level yearbooks for post-war Japan, they belong in the same framework. The goal is a comparative subnational data platform for studying rapid development wherever it has occurred.

  • Municipal Statistical Yearbooks 통계연보 Core — digitising now
    Township-level demographics, agriculture, industry, infrastructure, public finance, and education. Published annually by county governments.
  • Forest Type Maps 임상도 Collected
    Korea Forest Service spatial archives. Shapefiles available from 조선임야분포도 (1910) and forest type maps from 1974 onward. Enables studying one of history's largest reforestation programmes.
  • New Village Survey 새마을총람 Collected
    Comprehensive village-level survey published in 1972 by the Ministry of Home Affairs (내무부). Covers all ~35,000 villages with township-level maps, distance tables, and detailed statistics on households, population, land, and infrastructure. Led and digitised by Hyunjoo Yang (양현주, Sogang University).
  • Future archival layers (not yet in active digitisation)
  • Loan Relation Files 차관관계철 Planned
    Foreign loan records linking firms to locations and financing sources. Geocoding firms and mapping industrial networks.
  • Agricultural Extension Records Planned
    Farm-level adoption of high-yield rice varieties and extension programme participation.
  • Expressway Construction Logs Planned
    Construction timelines and route data for the Gyeongbu Expressway and subsequent motorway network.
  • Personnel Files Planned
    Local government personnel records. Bureaucratic capacity and institutional quality measures.
Data sources

MIRACLE draws on multiple archival source types. Municipal statistical yearbooks form the backbone; additional layers are planned.

통계연보

Municipal Statistical Yearbooks Core — digitising now

Township-level demographics, agriculture, industry, and public finance. Published annually by every county government.

임상도

Forest Type Maps Collected

Korea Forest Service spatial archives from 1910 and 1974 onward. Enables research on large-scale reforestation.

차관관계철

Loan Relation Files Planned

Foreign loan records linking firms to locations and financing sources.

새마을총람

New Village Survey Collected

Comprehensive village-level survey published in 1972 by the Ministry of Home Affairs (내무부), covering all ~35,000 villages in South Korea. Led and digitised by Hyunjoo Yang.


Pipeline — Yearbook Digitisation

From archive to analysis-ready panel in six steps:

01

Archive discovery & source identification Done

Systematic survey of provincial archives, university libraries, and government collections to locate surviving yearbook volumes. Mapping what exists, what is missing, and where physical copies are held.

02

Outreach & scanning Largely complete

Building partnerships with municipalities, counties, and provincial archives. Physical scanning of bound volumes into high-resolution page images — the raw input for digitisation.

03

AI-OCR for mixed scripts Current focus

Custom pipeline fine-tuned for mixed Hangul/Hanja archival tables. 87% pilot accuracy, targeting 92–95%. This is what makes the project feasible — these documents were previously unusable at scale.

Structured output — 경지면적현황, 남해군 (1969)
읍면합계논 (답)밭 (전)
소계1모작2모작
남해8,0545,8491,2304,6192,205✓ balanced
이동10,9937,5441,1756,3693,449✓ balanced
삼동12,7857,3491,2396,1105,436✓ balanced
남면11,4706,0128575,1555,458✓ balanced
고현8,3105,6807434,9372,630✓ balanced
창선13,1737,9012,3115,5905,272✓ balanced
All township names correct. Nested headers preserved. Row-level cross-validation passed.
Source: 경지면적현황, 남해군 통계연보 (1969) — mixed Hangul/Hanja table with vertical headers
1農 業 ~22← Page number confusion
22 경 지 면 적 현 황← Vertical text → individual chars
3(단위 :단보)
4구분 합 게 등 게 답 1포작 2포작 전 미 합게← Nested headers flattened
5면별 8,054 5,849 1,230 4,619 2,205 444← Row-column mapping unclear
6남 해 10,993 7,544 1,175 6,369 3,449 890← Numbers may be misaligned
7설 동 11,470 6,012 857 5,155 5,458 841← '삼동' → '설동' misrecognised
8남 9,553 5,891 1,101 4,790 3,662 522← Township name truncated
9저 현 9,564 9,705 550 6,145 2,859 955← '고현' split across lines
10창 13,173 7,901 2,311 5,590 5,272← '창선' → '창' only
⚠ Vertical headers completely failed. Table structure unrecoverable.
Layout parsing failure
Nested headers flattened — column-data mapping lost
Vertical text failure
Vertical Korean split into individual characters
Cell mapping errors
Numbers detached from columns
Same source — context-aware layout parsing + structured output
Step 1: Layout
Step 2: Context OCR
Step 3: Structure
Step 4: Validate
Table regions, header hierarchy, vertical text
'경지면적' context corrects '설동'→'삼동'
Nested headers → hierarchical CSV
Row totals = column totals; cross-ref

See structured output table above.

04

Variable harmonisation Current focus

Definitions, units, and table structures changed across editions and municipalities. We build crosswalks reconciling these into consistent time series.

05

Boundary concordances Pilot complete

Two major reorganisations (1963, 1973) plus dozens of smaller changes. We construct time-consistent miracle_id identifiers.

06

Geocoding & GIS Pilot complete

Every township linked to satellite, elevation, slope, soil, and transport network data. 196 Namhae-gun villages fully geocoded.


Output

The dataset is organised into modules by domain, each a flat township-year panel. Merge across modules using Core Keys. CSV, Stata & Python, with full codebook and variable documentation.

miracle_idyearprovmunitwppophhpaddy_haschoolsroad_km
KR-48-840-0101970경남남해군남해읍28,4125,6801,245723.4
KR-48-840-0101975경남남해군남해읍25,8915,3201,198831.7
KR-48-840-0101980경남남해군남해읍22,1055,0101,152838.2
KR-47-720-0301970경북영주시풍기읍31,5506,1401,870918.6
Illustrative example — pilot data release late 2026.
ModuleDescriptionETA
Core Keys
miracle_id · province · municipality · township · concordances
Geographic identifiers and boundary concordances across the 1963/1973 reorganisations.2026
Demographics
population · households · age structure
Population counts, household numbers, demographic composition.2026
Agriculture
paddy area · crop output · livestock
Cultivated area, output (harmonised to metric units), livestock.2026
Industry
establishments · employment · output
Industrial establishments, manufacturing employment, sectoral output.2027
Infrastructure
roads · electricity · water · telecom
Road length, electrification, public utilities.2027
Public Finance
revenues · expenditures · transfers
Municipal revenue/expenditure, central transfers, fiscal capacity.2027
Education
schools · enrolment · teachers
School counts, enrolment, teachers, educational infrastructure.2027
Geospatial
shapefiles · centroids · boundaries
GIS boundary files with consistent township geometries.2027
Institutions
clan concentration · bureaucratic capacity
Pre-treatment institutional measures from 1930 registries and personnel files.2028
📊
Public data explorer — interactive dashboard for browsing county-level data, in development. Preview →
Pilot release: late 2026. Gyeongbu Expressway corridor (~400 townships). Core Keys, Demographics, and Agriculture modules. CSV, Stata & Python. Request early access.
Current Status — Yearbook Scanning

Digitisation proceeds province by province, constrained by the uneven survival of physical yearbooks across Korea's provincial archives.

2023–24
Done
Source survey across provincial archives. AI-OCR pipeline developed. Namhae-gun pilot: 196 villages geocoded. Partnerships with KDI, Sogang.
2025
Done
STEG & LSE seed funding secured. Systematic scanning begins. OCR fine-tuning (87% accuracy). Variable harmonisation framework. Boundary concordance with Academy of Korean Studies.
2026
Active
Pilot data release: Gyeongbu Expressway corridor (~400 townships). Core Keys, Demographics, Agriculture modules. Public data explorer. Hiring RAs for 2026–27.
2027–28
Planned
Full national coverage. Industry, Infrastructure, Public Finance, Education modules. Additional archival sources. Expansion to other growth miracle economies.
경기 강원 충북 충남 전북 전남 경북 경남 제주 서울 부산 Namhae County pilot 196 villages geocoded Hover for details · Based on administrative boundaries
Largely complete (≥80%) Digitising

Last updated March 2026

→ View coverage by province
Research

South Korea compressed into three decades a sequence of transformations that most developing countries pursue individually: transport infrastructure, agricultural modernisation, environmental restoration, and political liberalisation. MIRACLE's panel structure makes it possible to ask not only how each of these interventions worked, but whether the same local conditions shaped their effectiveness across domains, and if so, whether that regularity is something policymakers can identify in advance.

Existing work

Seol, BooKang (2026). "The Endogenous Returns to Infrastructure: Social Institutions and the Choice of Development Paths."

The first paper using MIRACLE data studies how pre-existing social institutions—measured from colonial-era household registries—shaped the local returns to Korea's national rural development programme, the Saemaul Movement. It shows that these institutions did not simply strengthen or weaken the effects of new infrastructure. They shaped which forms of infrastructure became economically valuable by influencing the development strategies communities pursued and the bottlenecks they faced.

Research agenda

Korea's Gyeongbu Expressway (1968–70) absorbed more than 20% of the national budget and reshaped the country's economic geography. MIRACLE tracks townships along the corridor before, during, and after construction to examine how pre-existing social institutions shaped whether new connectivity led to industrial growth, out-migration, or persistent stagnation.

Canonical dual-economy models, from Lewis (1954) through Harris and Todaro (1970), predict that agricultural productivity gains release labour from farming and push workers toward cities. Korea offers a counterexample: the nationwide rollout of high-yield Tongil rice varieties between 1971 and 1979 generated one of the fastest recorded increases in agricultural productivity, yet rural incomes converged toward urban levels during rapid industrialisation. MIRACLE links crop output, cultivated area, extension timing, and non-farm employment at the township level to distinguish among competing channels: labour reallocation into cities, local demand linkages within rural economies, and capital transfers from the state.

Korea restored 2.8 million hectares of forest between 1962 and 1987 while industrialising from extreme poverty, a trajectory that challenges the Environmental Kuznets Curve hypothesis that ecological recovery follows prosperity. Forests exhibit strong spatial complementarities: isolated patches of trees provide negligible ecosystem services, but the same trees within a contiguous canopy deliver watershed protection, soil stability, and microclimate regulation. Successful reforestation may therefore require spatially coordinated intervention (an environmental analogue to the "big push") rather than marginal incentives alone. MIRACLE links Korea Forest Service spatial archives to township-level economic data to trace how land-use restrictions, coal briquette distribution, and administrative coordination shaped both forest recovery and local economic adjustment. A STEG Small Research Grant is supporting the construction of a 400-township panel for this component.

Korea democratised in 1987 after barely two decades of industrial growth. This pace challenges gradualist accounts of democratic change, but aligns uncomfortably well with modernisation theory's prediction that development itself generates political transformation. MIRACLE provides the micro-data to examine three competing channels at the local level: the diffusion of Protestant missions, which established some of the country's earliest organisations based on broad participation; the Saemaul Movement's village councils, which may have built democratic capacity through collective decision-making under authoritarian rule; and structural transformation itself, through urbanisation, education, and the emergence of a middle class. Because each channel implies a different geography and timeline of political mobilisation, the panel allows them to be tested against one another rather than studied in isolation. Distinguishing among them reveals whether Korea's democratic transition was a rupture or the culmination of institutional changes already under way.

A diagnostic framework under development with KDI aims to use the historical relationship between institutional endowments and programme returns to help practitioners assess where national investments are most likely to succeed before resources are committed.

If you are using or interested in using MIRACLE data, we would like to hear from you. Get in touch.

Team
BooKang Seol
Project Lead

BooKang Seol

설부강
Postdoctoral Researcher, LSE
Changkeun Lee
Project Lead

Changkeun Lee

이창근
KDI School of Public Policy and Management
Hyunjoo Yang
Project Lead

Hyunjoo Yang

양현주
Dept. of Economics, Sogang University
Hoyeon Byun
Research Associate

Hoyeon Byun

변호연
Ph.D. Candidate in Economics, SNU
Inhwan Park
Research Associate

Inhwan Park

박인환
Ph.D. Candidate in Economics, Sogang University
Yoorim Son
Research Associate

Yoorim Son

손유림
Master's student, Sogang University

Hiring research associates for 2026–27. Get in touch.

Partners

Korea Development Institute Sogang University

Funded by

London School of Economics STEG

For early access, collaboration, or questions — [enable JavaScript]