vignette-010-tidy-data.Rmd
Wrangle and Tidy Package data
All of the following data sources will be read in as raw data from
slfrsk/data/
(originally cleaned a bit from raw data
versions in slfrsk/data-raw/
) and then cleaned up
individually in this vignette (see
/data-raw/convert_data_rda.R
for script and commenting that
includes earlier unit conversion and transformation of raw data from
.csv
to .rda
files used here). Depending on
the situation, the resultant “cleaned/tidied” data may be saved to
slfrsk/data/
as a new .rda
file. In this
vignette, the following data are treated:
Previously, we built three separate species distribution models
(SDMs) with MaxEnt (See species distribution modeling analysis for more
information). We also obtained summary statistics for suitability in
each geopolitical unit (country and U.S. state) in each SDM (as well as
an ensemble SDM), which were transferred to the
states_extracts
and countries_extracts
datasets (or states_extracts_ensemble
and
countries_extracts_ensemble
for the ensemble SDM).
Here, we explore the ensemble maximum suitability by obtaining the maximum suitability per geopolitical unit for each SDM and calculating the mean and standard error across the models (or just use the extracted maximum value for ensemble models). These summarized suitabilities constitute our predicted Establishment potential.
We also strip Western Sahara from the countries
datasets here and for other datsets, as it is not informative for our
analyses (zero trade for Transport potential
below).
#new extract of ensemble
if(TRUE){
data("states_extracts_ensemble")
summary_states_extracts <- states_extracts_ensemble %>%
group_by(geopol_unit) %>%
summarize(grand_mean_max = mean(obs_max))
data("countries_extracts_ensemble")
summary_countries_extracts <- countries_extracts_ensemble %>%
group_by(geopol_unit) %>%
summarize(grand_mean_max = mean(obs_max))
#rm W. Sahara
summary_countries_extracts <- summary_countries_extracts %>%
filter(geopol_unit != "Western Sahara")
}
Now that Establishment data are nicely organized, we
can move on to Transport. We estimate Transport
potential based on the average annual mass of goods traded (in
metric tons) with SLF-established U.S. states over
2012–2017 (we also track value of traded goods in USD). Our existing
data can be partitioned into geopolitical units as above (states and
countries), but can further be separated into two categories: 1. states
with confirmed SLF populations (present
)
and 2. states that have a high probability of hosting
SLF populations in the near future based on proximity
to present
states and records of isolated observations
(future
).
Although we maintain several possible permutations of data to
analyze, we focus on mass of goods traded for present
scenarios, as these data are unlikely to be skewed by trade of high
value goods that are less likely to lead to transportation of egg masses
(e.g., electronics) and provide us with a more careful outlook for risk
of slf paninvasion.
We first treat the states data by adding two letter codes and
SLF-establishment status. For present
, we
treat the following states as having established SLF:
CT, DE, MD, NJ, NY, OH, PA, VA, and WV. For
future
, we add the following states as having established
SLF: NC. From here, the data are
tidied and gathered before calculating the mean and standard error (SE)
of trade with established states for 2012–2017. To do so, we first
removed all cases of intrastate trade for established states. Then, we
calculated the total trade with established states for each year before
obtaining the overall mean and SE.
Below, we demonstrate the code for both present
and
future
trade data in value (USD) for U.S.
states
.
#load states value
data("states_trade_summary_slf")
data("states_trade_summary_slf_future")
#code snippet to attach infection status and add state abbreviations
#state id's for adding to extracts
stateid <- c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "DC", "WV", "WI", "WY")
stateid <- tibble(stateid, states_trade_summary_slf$destination)
colnames(stateid) <- c("ID", "geopol_unit")
#PRESENT
#add in the infection status of each state while adding the abbreviations
states_trade_summary_slf <- left_join(states_trade_summary_slf, stateid, by = c("destination" = "geopol_unit")) %>%
mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
dplyr::select(destination, ID, status, everything())
#tidying of data
states_trade_summary_slf <- states_trade_summary_slf %>%
gather(-destination:-status, key = "infected_state", value = "trade") %>%
as.data.frame(.)
#total infected states trade, average/SE
summary_states_trade_value <- states_trade_summary_slf %>%
group_by(destination) %>%
#filter to be just trade with infected states present
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "ct_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_")
) %>%
#remove the all state trade from consideration
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
#make a temporary col
mutate(tempcol = str_split(infected_state, "_")) %>%
#group by row
rowwise() %>%
#split out the info about states and years of trade
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
#rm the temp col
dplyr::select(-tempcol) %>%
#group based now on the destination, year, and infection status
#filter out the intrastate trade for infected states to avoid extra low averages by creating a hybrid string (DestinationState+abbreviated_infected state)
group_by(destination, year, status) %>%
mutate(clear_intra = paste0(destination, source)) %>%
filter(!clear_intra %in%
c("Connecticutct",
"Delawarede",
"Marylandmd",
"New Jerseynj",
"New Yorkny",
"Ohiooh",
"Pennsylvaniapa",
"Virginiava",
"West Virginiawv")
) %>%
#remove workaround infected intrastate column
dplyr::select(-clear_intra) %>%
#add up the trade with infected states for each destination state and year (THIS IS NOT AVG INFECTED TRADE YET)
summarize(avg_infected_trade = sum(trade)) %>%
group_by(destination, status) %>%
#obtain the standard error (SE) of infected trade across years (this repeats across all of the entries across years and thus becomes its own value)
mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>%
#summarize the mean and SE across years! (there is no need to actually take the mean of SE here, but it makes summarizing simplier)
summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))
#FUTURE
states_trade_summary_slf_future <- left_join(states_trade_summary_slf_future, stateid, by = c("destination" = "geopol_unit")) %>%
mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NC", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
dplyr::select(destination, ID, status, everything())
#tidying of data
states_trade_summary_slf_future <- states_trade_summary_slf_future %>%
gather(-destination:-status, key = "infected_state", value = "trade") %>%
as.data.frame(.)
#total infected states trade, average/SE
summary_states_trade_value_future <- states_trade_summary_slf_future %>%
group_by(destination) %>%
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "nc_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_") |
str_detect(infected_state, "ct_")
) %>%
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
mutate(tempcol = str_split(infected_state, "_")) %>%
rowwise() %>%
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol) %>%
group_by(destination, year, status) %>%
#filter out the intrastate trade for infected states to avoid extra low averages
mutate(clear_intra = paste0(destination, source)) %>%
filter(!clear_intra %in%
c("Connecticutct",
"Delawarede",
"Marylandmd",
"North Carolinanc",
"New Jerseynj",
"New Yorkny",
"Ohiooh",
"Pennsylvaniapa",
"Virginiava",
"West Virginiawv"
)
) %>%
#remove workaround infected intrastate column
dplyr::select(-clear_intra) %>%
summarize(avg_infected_trade = sum(trade)) %>%
group_by(destination, status) %>%
mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>%
summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))
We repeat the same process for states
but for the total
mass of imported goods (mass
). In doing so, we convert the
existing units from kg to metric tons.
#load states mass
data("states_trade_mass_summary_slf")
data("states_trade_mass_summary_slf_future")
#PRESENT
#add in the infection status of each state while adding the abbreviations
states_trade_mass_summary_slf <- left_join(states_trade_mass_summary_slf, stateid, by = c("destination" = "geopol_unit")) %>%
mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
dplyr::select(destination, ID, status, everything())
#tidying of data and dividing by 1000 to convert from kg to metric tons
states_trade_mass_summary_slf <- states_trade_mass_summary_slf %>%
gather(-destination:-status, key = "infected_state", value = "mass") %>%
mutate(mass = mass / 1000) %>%
as.data.frame(.)
#mass
summary_states_trade_mass <- states_trade_mass_summary_slf %>%
group_by(destination) %>%
#filter to be just trade with infected states present
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "ct_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_")
) %>%
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
mutate(tempcol = str_split(infected_state, "_")) %>%
rowwise() %>%
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol) %>%
group_by(destination, year, status) %>%
#filter out the intrastate trade for infected states to avoid extra low averages
mutate(clear_intra = paste0(destination, source)) %>%
filter(!clear_intra %in%
c("Connecticutct",
"Delawarede",
"Marylandmd",
"New Jerseynj",
"New Yorkny",
"Ohiooh",
"Pennsylvaniapa",
"Virginiava",
"West Virginiawv")
) %>%
#remove workaround infected intrastate column
dplyr::select(-clear_intra) %>%
summarize(avg_infected_mass = sum(mass)) %>%
group_by(destination, status) %>%
mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))
#FUTURE
#add col to identify infected states
states_trade_mass_summary_slf_future <- left_join(states_trade_mass_summary_slf_future, stateid, by = c("destination" = "geopol_unit")) %>%
mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NC", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
dplyr::select(destination, ID, status, everything())
#tidying of data and dividing by 1000 to convert from kg to metric tons
states_trade_mass_summary_slf_future <- states_trade_mass_summary_slf_future %>%
gather(-destination:-status, key = "infected_state", value = "mass") %>%
mutate(mass = mass / 1000) %>%
as.data.frame(.)
#mass
summary_states_trade_mass_future <- states_trade_mass_summary_slf_future %>%
group_by(destination) %>%
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "nc_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_") |
str_detect(infected_state, "ct_")
) %>%
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
mutate(tempcol = str_split(infected_state, "_")) %>%
rowwise() %>%
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol) %>%
group_by(destination, year, status) %>%
#filter out the intrastate trade for infected states to avoid extra low averages
mutate(clear_intra = paste0(destination, source)) %>%
filter(!clear_intra %in%
c("Connecticutct",
"Delawarede",
"Marylandmd",
"North Carolinanc",
"New Jerseynj",
"New Yorkny",
"Ohiooh",
"Pennsylvaniapa",
"Virginiava",
"West Virginiawv"
)
) %>%
#remove workaround infected intrastate column
dplyr::select(-clear_intra) %>%
summarize(avg_infected_mass = sum(mass)) %>%
group_by(destination, status) %>%
mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))
Here, you can see the summary for the first 10 states for the
future
trade data for mass of trade (metric tons).
destination | status | avg_infected_mass | se |
---|---|---|---|
Alabama | not established | 8042185.41 | 89597.110 |
Alaska | not established | 91775.66 | 3473.339 |
Arizona | not established | 1541726.22 | 30613.203 |
Arkansas | not established | 2122765.62 | 15105.761 |
California | not established | 16449015.73 | 447722.953 |
Colorado | not established | 1789258.81 | 25597.146 |
Connecticut | established | 24300544.86 | 249553.550 |
Delaware | established | 17481966.16 | 345415.534 |
District of Columbia | not established | 3122179.60 | 30663.092 |
Florida | not established | 13669574.02 | 397146.471 |
From here, we turn to complete the same analyses for
countries
as states (value of trade in USD and then mass of
trade in metric tons). Notably, we do not have to worry about the
intrastate trade that we filtered out in the states
datasets. As such, we do not bother with attaching a
$status
identifier to each country (this is done as needed
for visualization later).
#COUNTRIES
#VALUE
#load countries values
data("countries_trade_summary_slf")
data("countries_trade_summary_slf_future")
#PRESENT
#tidying of data and log transforming values
countries_trade_summary_slf <- countries_trade_summary_slf %>%
gather(-destination, key = "infected_state", value = "trade") %>%
as.data.frame(.)
#value
summary_countries_trade_value <- countries_trade_summary_slf %>%
group_by(destination) %>%
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "ct_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_")
) %>%
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
mutate(tempcol = str_split(infected_state, "_")) %>%
rowwise() %>%
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol) %>%
group_by(destination, year) %>%
summarize(avg_infected_trade = sum(trade)) %>%
group_by(destination) %>%
mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>%
summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))
#FUTURE
#tidying of data and log transforming values
countries_trade_summary_slf_future <- countries_trade_summary_slf_future %>%
gather(-destination, key = "infected_state", value = "trade") %>%
as.data.frame(.)
#value
summary_countries_trade_value_future <- countries_trade_summary_slf_future %>%
group_by(destination) %>%
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "nc_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_") |
str_detect(infected_state, "ct_")
) %>%
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
mutate(tempcol = str_split(infected_state, "_")) %>%
rowwise() %>%
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol) %>%
group_by(destination, year) %>%
summarize(avg_infected_trade = sum(trade)) %>%
group_by(destination) %>%
mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>%
summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))
#MASS
#load countries mass
data("countries_trade_mass_summary_slf")
data("countries_trade_mass_summary_slf_future")
#PRESENT
#tidying of data and converting from kg to metric tons
countries_trade_mass_summary_slf <- countries_trade_mass_summary_slf %>%
gather(-destination, key = "infected_state", value = "mass") %>%
mutate(mass = mass / 1000) %>%
as.data.frame(.)
#mass
summary_countries_trade_mass <- countries_trade_mass_summary_slf %>%
group_by(destination) %>%
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "ct_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_")
) %>%
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
mutate(tempcol = str_split(infected_state, "_")) %>%
rowwise() %>%
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol) %>%
group_by(destination, year) %>%
summarize(avg_infected_mass = sum(mass)) %>%
group_by(destination) %>%
mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))
#FUTURE
#tidying of data and converting from kg to metric tons
countries_trade_mass_summary_slf_future <- countries_trade_mass_summary_slf_future %>%
gather(-destination, key = "infected_state", value = "mass") %>%
mutate(mass = mass / 1000) %>%
as.data.frame(.)
#mass
summary_countries_trade_mass_future <- countries_trade_mass_summary_slf_future %>%
group_by(destination) %>%
filter(str_detect(infected_state, "pa_") |
str_detect(infected_state, "de_") |
str_detect(infected_state, "va_") |
str_detect(infected_state, "nj_") |
str_detect(infected_state, "md_") |
str_detect(infected_state, "nc_") |
str_detect(infected_state, "ny_") |
str_detect(infected_state, "oh_") |
str_detect(infected_state, "wv_") |
str_detect(infected_state, "ct_")
) %>%
filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
mutate(tempcol = str_split(infected_state, "_")) %>%
rowwise() %>%
mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol) %>%
group_by(destination, year) %>%
summarize(avg_infected_mass = sum(mass)) %>%
group_by(destination) %>%
mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))
Here, you can see the summary for the first 10 countries for the
future
trade data for mass of trade (metric tons), just like thestates
data.
destination | avg_infected_mass | se |
---|---|---|
Afghanistan | 29071.49650 | 9715.82565 |
Albania | 13101.98933 | 3989.92572 |
Algeria | 138022.73517 | 35664.42137 |
Andorra | 96.19667 | 24.11481 |
Angola | 29848.68950 | 3197.03340 |
Anguilla | 1281.39267 | 184.55033 |
Antigua and Barbuda | 4080.22700 | 1238.47426 |
Argentina | 643884.60350 | 151554.71666 |
Armenia | 6604.53150 | 968.13989 |
Aruba | 57376.83367 | 30835.63316 |
Now that we have several permutations of trade with
SLF-established states data (both present
and future
scenarios) for both geopolitical units
(states
and countries
) as measured in
value and mass (USD and metric tons,
respectively), we can combine some of these data into fewer dataframes
to minimize the number of files we are juggling for future analyses. To
do this, we merge the value and mass
data for each geopolitical unit dataset.
As we do so, we take the time to create \(\log_{10}\) transformed versions of the trade value and mass data. To avoid possible errors for this transformation, we add a constant (\(value+1\)) to data prior to transformation. We use the same transformation process for Impact potential data.
#STATES
summary_states_trade <- left_join(summary_states_trade_value, summary_states_trade_mass, by = c("destination", "status"), suffix = c("_trade", "_mass")) %>%
mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))
summary_states_trade_future <- left_join(summary_states_trade_value_future, summary_states_trade_mass_future, by = c("destination", "status"), suffix = c("_trade", "_mass")) %>%
mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))
#COUNTRIES
summary_countries_trade <- left_join(summary_countries_trade_value, summary_countries_trade_mass, by = c("destination"), suffix = c("_trade", "_mass")) %>%
filter(destination != "Western Sahara") %>%
mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))
summary_countries_trade_future <- left_join(summary_countries_trade_value_future, summary_countries_trade_mass_future, by = c("destination"), suffix = c("_trade", "_mass")) %>%
filter(destination != "Western Sahara") %>%
mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))
Here is the merged
states
data forfuture
:
destination | status | avg_infected_trade | se_trade | avg_infected_mass | se_mass | log10_avg_infected_trade | log10_avg_infected_mass |
---|---|---|---|---|---|---|---|
Alabama | not established | 25746519575 | 86757733 | 8042185.41 | 89597.110 | 10.41072 | 6.905374 |
Alaska | not established | 3682900711 | 231580906 | 91775.66 | 3473.339 | 9.56619 | 4.962732 |
Arizona | not established | 16164900001 | 134829727 | 1541726.22 | 30613.203 | 10.20857 | 6.188008 |
Arkansas | not established | 10808696870 | 42330206 | 2122765.62 | 15105.761 | 10.03377 | 6.326902 |
California | not established | 114295367886 | 471914734 | 16449015.73 | 447722.953 | 11.05803 | 7.216140 |
Colorado | not established | 15392144303 | 237938106 | 1789258.81 | 25597.146 | 10.18730 | 6.252673 |
Connecticut | established | 50771613052 | 367967442 | 24300544.86 | 249553.550 | 10.70562 | 7.385616 |
Delaware | established | 26854722185 | 498504512 | 17481966.16 | 345415.534 | 10.42902 | 7.242590 |
District of Columbia | not established | 11196345230 | 573341994 | 3122179.60 | 30663.092 | 10.04908 | 6.494458 |
Florida | not established | 67241610026 | 770574348 | 13669574.02 | 397146.471 | 10.82764 | 7.135755 |
And here is the merged
countries
data forfuture
:
destination | avg_infected_trade | se_trade | avg_infected_mass | se_mass | log10_avg_infected_trade | log10_avg_infected_mass |
---|---|---|---|---|---|---|
Afghanistan | 235695855 | 86773679.7 | 29071.49650 | 9715.82565 | 8.372352 | 4.463482 |
Albania | 16536138 | 2452105.0 | 13101.98933 | 3989.92572 | 7.218434 | 4.117370 |
Algeria | 503940704 | 64383082.2 | 138022.73517 | 35664.42137 | 8.702379 | 5.139954 |
Andorra | 1457978 | 174345.8 | 96.19667 | 24.11481 | 6.163751 | 1.987651 |
Angola | 119484021 | 34733545.6 | 29848.68950 | 3197.03340 | 8.077310 | 4.474940 |
Anguilla | 3681561 | 196681.7 | 1281.39267 | 184.55033 | 6.566032 | 3.108021 |
Antigua and Barbuda | 14561905 | 2078482.2 | 4080.22700 | 1238.47426 | 7.163218 | 3.610791 |
Argentina | 1201585227 | 96144901.7 | 643884.60350 | 151554.71666 | 9.079755 | 5.808809 |
Armenia | 22230451 | 1940347.2 | 6604.53150 | 968.13989 | 7.346948 | 3.819908 |
Aruba | 76165589 | 9470751.5 | 57376.83367 | 30835.63316 | 7.881759 | 4.758744 |
Given the close relationship between cultivation of grapes
(grapes
) and winemaking (wine
), we evaluate
impact potential of SLF based on data from both.
For grapes
, two forms of cultivation data were
available:
Although we provide both sources of data, we focus more on production as a total measure of output per geopolitical unit.
data("states_grapes")
#summarize and log transform
summary_states_grapes <- states_grapes %>%
pivot_wider(names_from = Item, values_from = Value, id_cols = c(geopol_unit, Year)) %>%
group_by(geopol_unit) %>%
summarize(avg_yield = mean(grape_yield, na.rm = TRUE),
se_yield = sd(grape_yield) / sqrt(n_distinct(Year)),
avg_prod = mean(grape_production, na.rm = T), se_prod = sd(grape_production) / sqrt(n_distinct(Year))
) %>%
drop_na(.) %>%
mutate(log10_avg_yield = log10(avg_yield+1), log10_avg_prod = log10(avg_prod+1))
data("countries_grapes")
#first rm Western Sahara, then summarize and log transform, while adding in the step of changing NA's to zeros
summary_countries_grapes <- countries_grapes %>%
filter(geopol_unit != "Western Sahara") %>%
pivot_wider(names_from = Item, values_from = Value, id_cols = c(geopol_unit, Year)) %>%
group_by(geopol_unit) %>%
summarize(avg_yield = mean(grape_yield, na.rm = TRUE),
se_yield = sd(grape_yield) / sqrt(n_distinct(Year)),
avg_prod = mean(grape_production, na.rm = T), se_prod = sd(grape_production) / sqrt(n_distinct(Year))
) %>%
drop_na(.) %>%
mutate(log10_avg_yield = log10(avg_yield+1), log10_avg_prod = log10(avg_prod+1))
#fix some problematic names of countries
summary_countries_grapes$geopol_unit[grep(pattern = "Ivoire", x = summary_countries_grapes$geopol_unit, value = F)] <- "Ivory Coast"
summary_countries_grapes$geopol_unit[grep(pattern = "Bolivia (Plurinational State of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Bolivia"
summary_countries_grapes$geopol_unit[grep(pattern = "China, Taiwan Province of", x = summary_countries_grapes$geopol_unit, value = F)] <- "Taiwan"
summary_countries_grapes$geopol_unit[grep(pattern = "Iran (Islamic Republic of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Iran"
summary_countries_grapes$geopol_unit[grep(pattern = "Venezuela (Bolivarian Republic of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Venezuela"
summary_countries_grapes$geopol_unit[grep(pattern = "Viet Nam", x = summary_countries_grapes$geopol_unit, value = F)] <- "Vietnam"
For wine
, we were able to obtain wine production for
both geopolitical unit datasets in units of mass, metric tons.
data("states_wine")
#summarize and log transform
summary_states_wine <- states_wine %>%
group_by(geopol_unit) %>%
summarize(avg_wine = mean(Mass, na.rm = T), se_wine = sd(Mass) / sqrt(n_distinct(Year)), log10_avg_wine = log10(mean(Mass, na.rm = T)+1))
data("countries_wine")
#summarize and log transform, then rm anything that started out as NA or zero
summary_countries_wine <- countries_wine %>%
group_by(geopol_unit) %>%
summarize(avg_wine = mean(Mass, na.rm = T), se_wine = sd(Mass) / sqrt(n_distinct(Year)), log10_avg_wine = log10(mean(Mass, na.rm = T)+1)) %>%
filter(!is.infinite(log10_avg_wine) & !is.nan(log10_avg_wine))
Now that all of these data sources have been summarized and tidied,
we will merge them together with left_join()
sequentially,
using the trade
data (Transport potential)
as the inital dataset that all others are joined to. The resultant
datasets contain all permuatations of geopolitical units
(states
and countries
) and infection scenarios
(present
and future
).
During this process, we also use the wine
and
grape
data to produce binary categorical data for each
geopolitical unit to indicate whether it is involved directly in that
industry. For this, we also manually add the United States as a
grape producer in the countries
datasets. Additionally, we
manually add $status
to the countries
datasets
as follows:
#STATES
##PRESENT
#add the binaries
states_summary_present <- summary_states_extracts %>%
mutate(grapes = ifelse(geopol_unit %in% summary_states_grapes$geopol_unit, yes = "yes", no = "no")) %>%
mutate(wine = ifelse(geopol_unit %in% summary_states_wine$geopol_unit, yes = "yes", no = "no")) %>%
mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no")))
#now add in the other data
states_summary_present <- states_summary_present %>%
left_join(., summary_states_trade, by = c("geopol_unit" = "destination")) %>% #trade
left_join(., summary_states_wine, by = "geopol_unit") %>% #wine
left_join(., summary_states_grapes, by = "geopol_unit") #grapes
#deal with NA
states_summary_present[is.na(states_summary_present)] <- 0
#join the new ID's
states_summary_present <- left_join(stateid, states_summary_present, by = "geopol_unit")
#################################
#tidy the order of cols
states_summary_present <- states_summary_present %>%
#dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())
##FUTURE
#add the binaries
states_summary_future <- summary_states_extracts %>%
mutate(grapes = ifelse(geopol_unit %in% summary_states_grapes$geopol_unit, yes = "yes", no = "no")) %>%
mutate(wine = ifelse(geopol_unit %in% summary_states_wine$geopol_unit, yes = "yes", no = "no")) %>%
mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no")))
#now add in the other data
states_summary_future <- states_summary_future %>%
left_join(., summary_states_trade_future, by = c("geopol_unit" = "destination")) %>% #trade
left_join(., summary_states_wine, by = "geopol_unit") %>% #wine
left_join(., summary_states_grapes, by = "geopol_unit") #grapes
#deal with NA
states_summary_future[is.na(states_summary_future)] <- 0
#join the new ID's
states_summary_future <- left_join(stateid, states_summary_future, by = "geopol_unit")
#tidy the order of cols
states_summary_future <- states_summary_future %>%
#dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())
#################################
#COUNTRIES
##PRESENT
#add in the binaries
countries_summary_present <- summary_countries_extracts %>%
mutate(grapes = ifelse(geopol_unit %in% summary_countries_grapes$geopol_unit, yes = "yes", no = "no")) %>%
mutate(wine = ifelse(geopol_unit %in% summary_countries_wine$geopol_unit, yes = "yes", no = "no"))
#change U.S. from no to yes for grapes
countries_summary_present$grapes[countries_summary_present$geopol_unit == "United States"] <- "yes"
#The factor fixing step is now here
countries_summary_present <- countries_summary_present %>%
mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no")))
#now drop NAs for the extracts and add in the other data
countries_summary_present <- countries_summary_present %>%
drop_na(.) %>%
left_join(., summary_countries_trade, by = c("geopol_unit" = "destination")) %>% #trade
left_join(., summary_countries_wine, by = "geopol_unit") %>% #wine
left_join(., summary_countries_grapes, by = "geopol_unit") #grapes
#deal with NA
countries_summary_present[is.na(countries_summary_present)] <- 0
# add ID to make is simpler for labeling points
countries_summary_present <- countries_summary_present %>% mutate(ID = geopol_unit)
# add a column for status establishment
countries_summary_present <- countries_summary_present %>% mutate(status = "not established")
countries_summary_present$status[countries_summary_present$geopol_unit %in% c("China", "India", "Taiwan", "Vietnam")] <- "native"
countries_summary_present$status[countries_summary_present$geopol_unit %in% c("Japan", "South Korea", "United States")] <- "established"
#tidy the order of cols
countries_summary_present <- countries_summary_present %>%
#dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())
##FUTURE
#add in the binaries
countries_summary_future <- summary_countries_extracts %>%
mutate(grapes = ifelse(geopol_unit %in% summary_countries_grapes$geopol_unit, yes = "yes", no = "no")) %>%
mutate(wine = ifelse(geopol_unit %in% summary_countries_wine$geopol_unit, yes = "yes", no = "no"))
# NOTE: must This also includes manually changing the **United States** to a grape producer.
#change U.S. from no to yes for grapes
countries_summary_future$grapes[countries_summary_future$geopol_unit == "United States"] <- "yes"
#The factor fixing step is now here
countries_summary_future <- countries_summary_future %>%
mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no")))
#now drop NAs for the extracts and add in the other data
countries_summary_future <- countries_summary_future %>%
drop_na(.) %>%
left_join(., summary_countries_trade_future, by = c("geopol_unit" = "destination")) %>% #trade
left_join(., summary_countries_wine, by = "geopol_unit") %>% #wine
left_join(., summary_countries_grapes, by = "geopol_unit") #grapes
#deal with NA
countries_summary_future[is.na(countries_summary_future)] <- 0
# add ID to make is simpler for labeling points
countries_summary_future <- countries_summary_future %>% mutate(ID = geopol_unit)
# add a column for status establishment
countries_summary_future <- countries_summary_future %>% mutate(status = "not established")
countries_summary_future$status[countries_summary_future$geopol_unit %in% c("China", "India", "Taiwan", "Vietnam")] <- "native"
countries_summary_future$status[countries_summary_future$geopol_unit %in% c("Japan", "South Korea", "United States")] <- "established"
#tidy the order of cols
countries_summary_future <- countries_summary_future %>%
#dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())
This section provides the code to save each of the final summary
objects to the /data
directory. They are currently set to
not run, as the package should ship with the files
already in /data
. Note that this chunk can be run by
setting the top if
from FALSE
to
TRUE
!
if(FALSE){
save(states_summary_present, file = file.path(here(), "data", "states_summary_present.rda"))
save(states_summary_future, file = file.path(here(), "data", "states_summary_future.rda"))
save(countries_summary_present, file = file.path(here(), "data", "countries_summary_present.rda"))
save(countries_summary_future, file = file.path(here(), "data", "countries_summary_future.rda"))
}
if(FALSE){
states_summary_future_ensemble <- states_summary_future
states_summary_present_ensemble <- states_summary_present
countries_summary_future_ensemble <- countries_summary_future
countries_summary_present_ensemble <- countries_summary_present
save(states_summary_future_ensemble, file = file.path(here(), "data", "states_summary_future_ensemble.rda"))
save(states_summary_present_ensemble, file = file.path(here(), "data", "states_summary_present_ensemble.rda"))
save(countries_summary_future_ensemble, file = file.path(here(), "data", "countries_summary_future_ensemble.rda"))
save(countries_summary_present_ensemble, file = file.path(here(), "data", "countries_summary_present_ensemble.rda"))
}