Project MIRACLE

The first comprehensive subnational economic panel for South Korea's growth miracle. Township-level data on demographics, agriculture, industry, infrastructure, public finance, and education — from 1960 to 1989.

~2M archival pages
~3,500 townships
30 years
6+ domains

South Korea's transformation from one of the world's poorest countries to a high-income economy is among the most studied development stories — yet researchers have lacked granular, consistent subnational data spanning the period. MIRACLE fills this gap by digitising, harmonising, and geocoding municipal statistical yearbooks (시군 통계연보) held in Korean national and provincial archives into an annual township-level panel at the 읍·면·동 level.

MIRACLE starts with South Korea's municipal statistical yearbooks, but the ambition extends in two directions. First, within Korea, we plan to incorporate additional administrative sources — expressway construction logs, agricultural extension records, Korea Forest Service archives, colonial-era household registries, and local personnel files — to deepen the panel and enable research designs that link infrastructure, agricultural modernisation, and environmental policy to local institutional conditions.

Second, across countries, the infrastructure we build is designed to accommodate other growth miracle economies with comparable subnational statistical traditions. If similar municipal records exist for Taiwan, or district-level yearbooks for post-war Japan, they belong in the same framework. The goal is a comparative subnational data platform for studying rapid development wherever it has occurred.

Source material

Municipal statistical yearbooks (시군 통계연보), published annually by county and city governments. Many survive only as single physical copies in provincial archives, uncatalogued and deteriorating.

Namhae-gun statistical yearbook page showing farming area by township, 1969
Farming area table, Namhae-gun (1969). Rice paddies and dry-field area by township. Hanja headings; later editions switch to hangul.
Kosung-gun statistical yearbook page showing population by township, 1975
Population table, Kosung-gun (1975). Households and population by township.

Pipeline

From archival pages to analysis-ready panel data in four steps:

01

AI-OCR for mixed scripts

Custom pipeline fine-tuned for mixed Hangul/Hanja archival tables. 87% pilot accuracy, targeting 92–95%. This is what makes the project feasible — these documents were previously unusable at scale.

02

Variable harmonisation

Definitions, units, and table structures changed across editions and municipalities. We build crosswalks reconciling these into consistent time series across decades.

03

Boundary concordances

Two major reorganisations (1963, 1973) plus dozens of smaller changes. We construct time-consistent miracle_id identifiers.

04

Geocoding & GIS

Every township linked to satellite, elevation, slope, soil, and transport network data. 196 Namhae-gun villages fully geocoded.

Source: 경지면적현황, 남해군 통계연보 (1969) — mixed Hangul/Hanja table with vertical headers
1農 業 ~22← Page number confusion
22 경 지 면 적 현 황← Vertical text → individual chars
3(단위 :단보)
4구분 합 게 등 게 답 1포작 2포작 전 미 합게← Nested headers flattened
5면별 8,054 5,849 1,230 4,619 2,205 444← Row-column mapping unclear
6남 해 10,993 7,544 1,175 6,369 3,449 890← Numbers may be misaligned
7설 동 11,470 6,012 857 5,155 5,458 841← '삼동' → '설동' misrecognised
8남 9,553 5,891 1,101 4,790 3,662 522← Township name truncated
9저 현 9,564 9,705 550 6,145 2,859 955← '고현' split across lines
10창 13,173 7,901 2,311 5,590 5,272← '창선' → '창' only
⚠ Vertical headers completely failed. Table structure unrecoverable.
📐 Layout parsing failure
Nested headers flattened — column-data mapping lost
📝 Vertical text failure
Vertical Korean split into individual characters
🔢 Cell mapping errors
Numbers detached from columns
Same source — context-aware layout parsing + structured output
Step 1: Layout
Step 2: Context OCR
Step 3: Structure
Step 4: Validate
Table regions, header hierarchy, vertical text
'경지면적' context corrects '설동'→'삼동'
Nested headers → hierarchical CSV
Row totals = column totals; cross-ref
📊 Structured output — 경지면적현황, 남해군 (1969)
읍면합계논 (답)밭 (전)
소계1모작2모작
남해8,0545,8491,2304,6192,205✓ balanced
이동10,9937,5441,1756,3693,449✓ balanced
삼동12,7857,3491,2396,1105,436✓ balanced
남면11,4706,0128575,1555,458✓ balanced
고현8,3105,6807434,9372,630✓ balanced
창선13,1737,9012,3115,5905,272✓ balanced
All township names correct. Nested headers preserved. Row-level cross-validation passed.

Output

Each row is a township-year observation linked by miracle_id:

miracle_idyearprovmunitwppophhpaddy_haschoolsroad_km
KR-48-840-0101970경남남해군남해읍28,4125,6801,245723.4
KR-48-840-0101975경남남해군남해읍25,8915,3201,198831.7
KR-48-840-0101980경남남해군남해읍22,1055,0101,152838.2
KR-47-720-0301970경북영주시풍기읍31,5506,1401,870918.6

Pilot release (Gyeongbu corridor, ~400 townships): late 2026. CSV & Stata.

The full dataset is organised into modules by domain, each a flat panel at the township-year level with consistent miracle_id identifiers. Merge across modules using Core Keys.

ModuleDescriptionETA
Core Keys
miracle_id · province · municipality · township · concordances
Geographic identifiers and boundary concordances across the 1963/1973 reorganisations.2026
Demographics
population · households · age structure
Population counts, household numbers, demographic composition.2026
Agriculture
paddy area · crop output · livestock
Cultivated area, output (harmonised to metric units), livestock.2026
Industry
establishments · employment · output
Industrial establishments, manufacturing employment, sectoral output.2027
Infrastructure
roads · electricity · water · telecom
Road length, electrification, public utilities.2027
Public Finance
revenues · expenditures · transfers
Municipal revenue/expenditure, central transfers, fiscal capacity.2027
Education
schools · enrolment · teachers
School counts, enrolment, teachers, educational infrastructure.2027
Geospatial
shapefiles · centroids · boundaries
GIS boundary files with consistent township geometries.2027
Institutions
clan concentration · bureaucratic capacity
Pre-treatment institutional measures from 1930 registries and personnel files.2028
📊
Public data explorer — interactive dashboard for browsing county-level data, in development. Preview →

Digitisation proceeds province by province. Hover for details on coverage, years, and scanning status.

경기 강원 서울 충북 충남 경북 경남 전북 전남 제주 남해군 pilot 196 villages geocoded 부산 Hover for details · Stylised — not to scale
Pilot complete Digitising Sources located Planned

Last updated March 2026

2023–24 Done
Source identification. AI-OCR pipeline. 196 villages geocoded in Namhae-gun. Partnerships with KDI and Sogang.
2025 Active
Systematic digitisation. OCR fine-tuning. Variable harmonisation. GIS boundary reconciliation.
Late 2026 Planned
Pilot release: Gyeongbu Expressway corridor townships. Core Keys and initial domain modules.
2027–28 Planned
Full national coverage. Additional archival sources. Expansion to other growth miracle economies.

MIRACLE enables research designs that were previously impossible — including the first causal analysis of the Gyeongbu Expressway, the Saemaul Undong (R&R, JPE), Korea's high-yield rice revolution, and one of history's largest reforestation programmes. Full research pipeline →

Other applications. Geography of industrialisation, education expansion, fiscal transfers, land reform, environmental policy, developmental states. Using MIRACLE data? Let us know.

BSPhoto
Principal Investigator

BooKang Seol

설북강
Postdoctoral Researcher
Dept. of Intl Development, LSE
bookangseol.com
Photo
Co-Investigator

Changkeun Lee

이창근
Korea Development Institute (KDI)
Photo
Co-Investigator

Hyunjoo Yang

양현주
Dept. of Economics
Sogang University
RAPhoto
Research Assistant

[Name]

[한글 이름]
[Affiliation]
RAPhoto
Research Assistant

[Name]

[한글 이름]
[Affiliation]
RAPhoto
Research Assistant

[Name]

[한글 이름]
[Affiliation]

Interested in contributing?

Partners

KDI
KDI
Korea Development Institute
LSE
LSE
London School of Economics
Sogang
Sogang
Sogang University
STEG
STEG
Structural Transformation & Economic Growth

Questions about the data, collaboration opportunities, or early access — we'd love to hear from you.

[enable JavaScript to see email]

Citation

Seol, BooKang, Changkeun Lee, and Hyunjoo Yang. "Project MIRACLE: Subnational Economic Data for South Korea's Developmental Period, 1960–1989." London School of Economics, 2026. @techreport{seol2026miracle, author = {Seol, BooKang and Lee, Changkeun and Yang, Hyunjoo}, title = {Project {MIRACLE}: Subnational Economic Data for South Korea's Developmental Period, 1960--1989}, institution = {London School of Economics}, year = {2026} }