Political Science Academic Job Market
An NLP Analysis of APSA eJobs Listings
Executive Summary
This report analyzes the political science academic job market using all active listings published in APSA Political Science Jobs — the official monthly eJobs journal of the American Political Science Association. Subfield counts are sourced directly from each issue’s Table of Contents (page 2) for accuracy, while individual job records are scraped from body text and verified against those ground-truth counts.
0.1 Dataset at a Glance
0.2 Geographic Overview
1 Data Collection & Parsing
1.1 Methodology
The data pipeline operates in three stages:
- PDF Extraction —
pdftools::pdf_text()reads raw text from each monthly issue. - Page 2 TOC Parsing — A regex pipeline targeting
(N listings)on the Table of Contents extracts the official, ground-truth count per subfield per issue. - Body Text Scraping — The parser walks the document line-by-line, tracking section headers to assign subfields, flushing a record each time it detects an
eJobs ID:marker.
1.2 Verification Report
2 Subfield Analysis
2.1 Summary Statistics by Subfield
| Subfield | N Jobs | N w/ Salary | Median Salary | Mean Salary | % TT | % Visiting | % Teaching Trk | % Postdoc |
|---|---|---|---|---|---|---|---|---|
| Methods | 1496 | 191 | $65,000 | $72,254 | 15.4% | 3.9% | 4.3% | 6.5% |
| CP | 1490 | 172 | $65,000 | $67,282 | 16.4% | 2.9% | 3.0% | 7.0% |
| IR | 1461 | 179 | $60,000 | $66,753 | 14.6% | 2.7% | 3.1% | 8.5% |
| AP | 946 | 105 | $65,000 | $68,731 | 16.4% | 4.2% | 5.5% | 9.6% |
| Other | 946 | 94 | $65,000 | $67,877 | 16.1% | 3.1% | 3.7% | 6.0% |
| PT | 942 | 78 | $65,000 | $64,371 | 16.2% | 3.8% | 3.9% | 5.1% |
| PL | 634 | 60 | $75,000 | $74,696 | 23.0% | 4.1% | 2.8% | 4.1% |
| PP | 574 | 61 | $72,500 | $78,408 | 15.9% | 2.4% | 3.5% | 7.1% |
| Open | 525 | 64 | $65,000 | $73,694 | 12.6% | 1.5% | 4.2% | 7.4% |
| Non-Academic | 501 | 54 | $65,000 | $72,906 | 12.4% | 2.2% | 3.4% | 4.6% |
| Admin | 494 | 55 | $65,000 | $76,880 | 14.4% | 3.6% | 4.0% | 3.8% |
| PAdmin | 159 | 15 | $65,000 | $67,880 | 17.6% | 1.3% | 5.0% | 1.3% |
2.2 Trends Over Time
Show Code
toc_year %>%
filter(!is.na(year), !is.na(subfield)) %>%
ggplot(aes(x = year, y = toc_count, colour = subfield)) +
geom_line(linewidth = 0.9) +
geom_point(size = 1.8) +
scale_colour_brewer(palette = "Paired") +
scale_x_continuous(breaks = pretty_breaks(n = 8)) +
labs(title = "Job Listings by Subfield Over Time",
subtitle = "Counts sourced from the Table of Contents of each issue",
x = NULL, y = "Listings", colour = "Subfield") +
theme(legend.position = "bottom",
legend.text = element_text(size = 9, family = PAL))Show Code
annual_totals %>%
filter(!is.na(year)) %>%
ggplot(aes(x = year, y = total, fill = total)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = comma(total)), vjust = -0.4,
size = 3.2, family = PAL) +
scale_fill_viridis_c(option = "D", direction = -1) +
scale_x_continuous(breaks = pretty_breaks(n = 8)) +
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, .1))) +
labs(title = "Total Political Science Job Listings Per Year",
x = NULL, y = "Total Listings")Show Code
toc_year %>%
filter(!is.na(year), !is.na(subfield)) %>%
ggplot(aes(x = year,
y = reorder(subfield, toc_count),
fill = toc_count)) +
geom_tile(colour = "white", linewidth = 0.4) +
geom_text(aes(label = toc_count), size = 2.6,
colour = "white", family = PAL) +
scale_fill_viridis_c(option = "C", name = "Listings") +
scale_x_continuous(breaks = pretty_breaks(n = 8)) +
labs(title = "Subfield × Year Heatmap", x = NULL, y = NULL) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, family = PAL),
legend.position = "bottom",
legend.text = element_text(size = 9, family = PAL))3 Rank Analysis
3.1 Distribution of Rank Categories
Show Code
rank_order <- jobs %>%
count(rank_category) %>% arrange(n) %>% pull(rank_category)
jobs %>%
count(rank_category) %>%
mutate(rank_category = factor(rank_category, levels = rank_order)) %>%
ggplot(aes(x = n, y = rank_category, fill = n)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = comma(n)), hjust = -0.2,
size = 3.4, family = PAL) +
scale_fill_viridis_c(option = "E", direction = -1) +
scale_x_continuous(expand = expansion(mult = c(0, .18))) +
labs(title = "Job Listings by Rank Category",
x = "Number of Listings", y = NULL)3.2 Rank Composition by Subfield
Show Code
jobs %>%
filter(rank_category %in% tt_ranks, !is.na(subfield)) %>%
count(subfield, rank_category) %>%
group_by(subfield) %>%
mutate(pct = n / sum(n)) %>% ungroup() %>%
ggplot(aes(x = reorder(subfield, -n, sum), y = pct,
fill = factor(rank_category, levels = rev(tt_ranks)))) +
geom_col() +
scale_y_continuous(labels = percent_format()) +
scale_fill_brewer(palette = "Spectral", name = "Rank") +
labs(title = "Rank Composition by Subfield",
x = NULL, y = "Share of Listings") +
theme(axis.text.x = element_text(angle = 35, hjust = 1, family = PAL),
legend.position = "bottom",
legend.text = element_text(size = 9, family = PAL))3.3 Rank Trends Over Time
Show Code
jobs %>%
filter(rank_category %in% tt_ranks, !is.na(year)) %>%
count(year, rank_category) %>%
ggplot(aes(x = year, y = n, colour = rank_category)) +
geom_line(linewidth = 0.8) +
geom_point(size = 1.5) +
scale_colour_brewer(palette = "Dark2", name = "Rank") +
scale_x_continuous(breaks = pretty_breaks(n = 8)) +
labs(title = "Rank Category Trends Over Time",
x = NULL, y = "Listings") +
theme(legend.position = "bottom",
legend.text = element_text(size = 9, family = PAL))Key observation: Tenure-track assistant professor positions (Asst Prof (TT)) typically dominate the market. Watch the Visiting Professor and Teaching Track trend lines — a rising share may signal structural shifts in how departments staff their courses.
4 Geographic Distribution
4.1 Listings by US Region
Show Code
jobs %>%
count(region) %>%
arrange(desc(n)) %>%
ggplot(aes(x = reorder(region, n), y = n, fill = region)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = comma(n)), hjust = -0.2,
size = 3.5, family = PAL) +
scale_fill_brewer(palette = "Set2") +
scale_y_continuous(expand = expansion(mult = c(0, .15))) +
coord_flip() +
labs(title = "Job Listings by US Region",
x = NULL, y = "Listings")4.2 Top 20 States
Show Code
jobs %>%
filter(!is.na(state_raw)) %>%
count(state_raw, sort = TRUE) %>%
slice_head(n = 20) %>%
ggplot(aes(x = n, y = reorder(state_raw, n), fill = n)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = comma(n)), hjust = -0.2,
size = 3.3, family = PAL) +
scale_fill_viridis_c(option = "D", direction = -1) +
scale_x_continuous(expand = expansion(mult = c(0, .15))) +
labs(title = "Top 20 States by Job Listings",
x = "Listings", y = NULL)4.3 State Choropleth (Detail)
5 Salary Analysis
Only listings with an explicit numeric salary are included here. Most listings say “Competitive” or “Commensurate with experience” and are excluded. Interpret figures with caution.
5.1 Salary by Rank and Subfield
Show Code
sal_df <- jobs %>% filter(!is.na(salary_est), salary_est > 10000)
p_sal_rank <- sal_df %>%
filter(rank_category %in% tt_ranks) %>%
ggplot(aes(x = reorder(rank_category, salary_est, median),
y = salary_est, fill = rank_category)) +
geom_boxplot(outlier.shape = 21, outlier.size = 1.5, show.legend = FALSE) +
scale_y_continuous(labels = dollar_format()) +
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
labs(title = "Salary Distribution by Rank",
subtitle = "Listings with explicit numeric salary only",
x = NULL, y = "Estimated Annual Salary")
p_sal_sf <- sal_df %>%
filter(!is.na(subfield)) %>%
ggplot(aes(x = reorder(subfield, salary_est, median),
y = salary_est, fill = subfield)) +
geom_boxplot(outlier.shape = 21, outlier.size = 1.5, show.legend = FALSE) +
scale_y_continuous(labels = dollar_format()) +
scale_fill_brewer(palette = "Paired") +
coord_flip() +
labs(title = "Salary Distribution by Subfield",
x = NULL, y = "Estimated Annual Salary")
p_sal_rank / p_sal_sf5.2 Salary Trend Over Time
Show Code
sal_df %>%
filter(!is.na(year)) %>%
group_by(year) %>%
summarise(median_sal = median(salary_est),
mean_sal = mean(salary_est),
n = n(), .groups = "drop") %>%
ggplot(aes(x = year)) +
geom_ribbon(aes(ymin = median_sal, ymax = mean_sal),
alpha = 0.18, fill = "steelblue") +
geom_line(aes(y = median_sal, colour = "Median"), linewidth = 1) +
geom_line(aes(y = mean_sal, colour = "Mean"),
linewidth = 1, linetype = "dashed") +
scale_y_continuous(labels = dollar_format()) +
scale_x_continuous(breaks = pretty_breaks(n = 8)) +
scale_colour_manual(values = c(Median = "steelblue", Mean = "tomato"),
name = NULL) +
labs(title = "Salary Trend Over Time",
subtitle = paste0("n = ", comma(nrow(sal_df)),
" listings with explicit numeric salary"),
x = NULL, y = "Annual Salary") +
theme(legend.position = "bottom",
legend.text = element_text(size = 9, family = PAL))6 Text Mining
6.1 Top 30 Terms
Show Code
tidy_words %>%
count(word, sort = TRUE) %>%
slice_head(n = 30) %>%
ggplot(aes(x = n, y = reorder(word, n), fill = n)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = comma(n)), hjust = -0.2,
size = 3.3, family = PAL) +
scale_fill_viridis_c(option = "D", direction = -1) +
scale_x_continuous(labels = comma,
expand = expansion(mult = c(0, .15))) +
labs(title = "Top 30 Terms Across All Job Listings",
subtitle = "After removing stopwords; based on rank + unit fields",
x = "Frequency", y = NULL)6.2 TF-IDF: Distinctive Terms by Subfield
TF-IDF (Term Frequency–Inverse Document Frequency) surfaces words that are unusually common in one subfield relative to all others — revealing the distinctive vocabulary of each field.
Show Code
tidy_words %>%
filter(!is.na(subfield)) %>%
count(subfield, word) %>%
bind_tf_idf(word, subfield, n) %>%
group_by(subfield) %>%
slice_max(tf_idf, n = 8) %>% ungroup() %>%
mutate(word = reorder_within(word, tf_idf, subfield)) %>%
ggplot(aes(x = tf_idf, y = word, fill = subfield)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ subfield, scales = "free_y", ncol = 3) +
scale_y_reordered() +
scale_fill_brewer(palette = "Paired") +
labs(title = "Most Distinctive Terms by Subfield (TF-IDF)",
subtitle = "Words uniquely associated with each subfield",
x = "TF-IDF Score", y = NULL) +
theme(axis.text.y = element_text(size = 8, family = PAL))6.3 TF-IDF: Distinctive Terms by Rank
Show Code
tidy_words %>%
filter(rank_category %in% tt_ranks) %>%
count(rank_category, word) %>%
bind_tf_idf(word, rank_category, n) %>%
group_by(rank_category) %>%
slice_max(tf_idf, n = 8) %>% ungroup() %>%
mutate(word = reorder_within(word, tf_idf, rank_category)) %>%
ggplot(aes(x = tf_idf, y = word, fill = rank_category)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ rank_category, scales = "free_y", ncol = 3) +
scale_y_reordered() +
scale_fill_brewer(palette = "Spectral") +
labs(title = "Most Distinctive Terms by Rank (TF-IDF)",
x = "TF-IDF Score", y = NULL) +
theme(axis.text.y = element_text(size = 8, family = PAL))6.4 Top Bigrams
Show Code
jobs %>%
unnest_tokens(bigram, full_text, token = "ngrams", n = 2) %>%
separate(bigram, into = c("w1","w2"), sep = " ") %>%
filter(!w1%in% stop_words$word, !w2%in% stop_words$word,
!w1%in% ps_stopwords$word, !w2%in% ps_stopwords$word,
str_length(w1) > 2, str_length(w2) > 2,
!str_detect(w1,"^\\d+$"), !str_detect(w2,"^\\d+$")) %>%
unite(bigram, w1, w2, sep = " ") %>%
count(bigram, sort = TRUE) %>%
slice_head(n = 25) %>%
ggplot(aes(x = n, y = reorder(bigram, n), fill = n)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = comma(n)), hjust = -0.2,
size = 3.3, family = PAL) +
scale_fill_viridis_c(option = "C", direction = -1) +
scale_x_continuous(labels = comma,
expand = expansion(mult = c(0, .15))) +
labs(title = "Top 25 Bigrams in Job Listings",
subtitle = "Common two-word phrases after stopword removal",
x = "Frequency", y = NULL)6.5 Word Clouds by Subfield
7 Browse All Jobs
8 Appendix
8.1 R Session Info
Show Code
sessionInfo()R version 4.5.3 (2026-03-11)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggtext_0.1.2 plotly_4.11.0 DT_0.33 kableExtra_1.4.0
[5] knitr_1.50 patchwork_1.3.2 maps_3.4.3 ggridges_0.5.6
[9] viridis_0.6.5 viridisLite_0.4.2 wordcloud_2.6 RColorBrewer_1.1-3
[13] SnowballC_0.7.1 tidytext_0.4.2 lubridate_1.9.5 scales_1.4.0
[17] ggplot2_4.0.0 tibble_3.3.0 stringr_1.6.0 tidyr_1.3.2
[21] dplyr_1.2.0
loaded via a namespace (and not attached):
[1] janeaustenr_1.0.0 sass_0.4.10 generics_0.1.4 xml2_1.5.1
[5] stringi_1.8.7 lattice_0.22-9 digest_0.6.39 magrittr_2.0.4
[9] evaluate_1.0.5 grid_4.5.3 timechange_0.4.0 fastmap_1.2.0
[13] jsonlite_2.0.0 Matrix_1.7-4 gridExtra_2.3 httr_1.4.7
[17] purrr_1.2.1 crosstalk_1.2.1 jquerylib_0.1.4 codetools_0.2-20
[21] lazyeval_0.2.2 textshaping_1.0.1 cli_3.6.5 rlang_1.1.7
[25] tokenizers_0.3.0 cachem_1.1.0 withr_3.0.2 yaml_2.3.10
[29] tools_4.5.3 vctrs_0.7.2 R6_2.6.1 lifecycle_1.0.5
[33] htmlwidgets_1.6.4 pkgconfig_2.0.3 bslib_0.9.0 pillar_1.11.1
[37] gtable_0.3.6 data.table_1.17.2 glue_1.8.0 Rcpp_1.1.0
[41] systemfonts_1.2.3 xfun_0.54 tidyselect_1.2.1 rstudioapi_0.17.1
[45] farver_2.1.2 htmltools_0.5.8.1 labeling_0.4.3 svglite_2.2.1
[49] rmarkdown_2.30 compiler_4.5.3 S7_0.2.0 gridtext_0.1.5
8.2 Data Pipeline Summary
| Stage | Tool | Output |
|---|---|---|
| PDF text extraction | pdftools::pdf_text() |
Raw character vectors |
| TOC count parsing | stringr regex on pages 1–3 |
ps_jobs_toc_counts.csv |
| Body text scraping | Line-by-line section tracker + eJobs ID flush | ps_jobs_all_raw.csv |
| Deduplication | dplyr::distinct(ejobs_id) |
ps_jobs_all_unique.csv |
| Verification | TOC count vs scraped count per issue × subfield | ps_jobs_verification.csv |
| Analytics & Report | ggplot2, tidytext, maps, Quarto |
This document |
8.3 Subfield Code Reference
| Code | Full Name |
|---|---|
| AP | American Government and Politics |
| CP | Comparative Politics |
| IR | International Relations |
| Methods | Methodology |
| PT | Political Theory |
| PL | Public Law |
| PP | Public Policy |
| PAdmin | Public Administration |
| Admin | Administration |
| Non-Academic | Non-Academic Positions |
| Open | Open Subfield |
| Other | Other |