R - Agriculture Crop Production EDA - Part 1

R - Agriculture Crop Production EDA - Part 1

I found a dataset on data.world that had a great deal of information on country crop yields. It contained 5 data tables divided into different groupings including Africa, The Americas, Asia, Europe, and Oceania. It also contained 3 more tables that helped explain the dataset and offer context about the data collection, measurements, etc.

Lets preview the data concerning Africa.

Area Code Area Item Code Item Element Code Element Unit Y1961
4 Algeria 221 Almonds, with shell 5312 Area harvested ha 13300
4 Algeria 221 Almonds, with shell 5419 Yield hg/ha 4511
4 Algeria 221 Almonds, with shell 5510 Production tonnes 6000
4 Algeria 515 Apples 5312 Area harvested ha 3400
4 Algeria 515 Apples 5419 Yield hg/ha 45294
4 Algeria 515 Apples 5510 Production tonnes 15400

Each agricultural product in each country has three entries. The 'Element' column describes the measure. Three elements are included 'Area harvested', 'Yield', and 'Production'. 'Yield' is an aggregate of the 'Area harvested', and 'Production'.

The data for the years is ordered by column as well.

Y1961F Y1962 Y1962F Y1963 Y1963F Y1964 Y1964F Y1965
F 13300 F 13300 F 14200 F 13800
Fc 4511 Fc 4511 Fc 4507 Fc 4493

6000
6000
6400
6200
F 3100 F 2800 F 2700 F 2900
Fc 45161 Fc 46429 Fc 46078 Fc 45348

14000
13000
12441
13151

This continues until 2018. Notice how there is a date with an 'F' at the end of it? This allows us to know how the data was collected. This data can be found in the 'Flags.csv' file included in the dataset. I include it for those are curious, but it was not used in this data exploration. I'll use an ellipse to denote where I snipped the dataset.

Flag Flags

Official data
* Unofficial figure
A Aggregate, may include official, semi-official, estimated or calculated data
F FAO estimate
Fb Data obtained as a balance
Fc Calculated data

TO THE ANALYSIS

We will be answering 5 questions on this data.

     
  1. Which country in Africa produced the most coffee in 2018?
  2.  
  3. Which country in Africa produced the most coffee per acre in 2018?
  4.  
  5. Which country in Africa has the largest increase in the amount of coffee produced since 1961?
  6.  
  7. Which country (in the world, not specifically Africa) saw the highest increase in the amount of coffee grown per acre since 1961?*
  8.  
  9. Which continent (or rather, major grouping as presented in the 5 data sets) has the largest increase in the amount of coffee grown per acre since 1961?*

*This post contains solutions for 1-3, the next post details 4 and 5


1 Which country in Africa produced the most coffee in 2018?

We are going to start by importing the data. Then we distill the columns that we to display by subsetting the dataset and including only 'Area', 'Item', 'Element', 'Unit', and the year in question 'Y2018'.

The idea here is simple, we are going to filter where the 'Item Code' is equal to '656', and filter on the 'Element' column where it is equal to 'Production'. After this it is a matter of sorting data on the 'Y2018' column, as this column contains the numerical 'Production' values.

#Import dplyr
library(dplyr)

#Import data
africa_table = read.csv("C:\\Users\\WillPortFolio\\Desktop\\Blog\\Project 05 R and Apache Superset\\Data\\agriculture-crop-production\\Production_Crops_E_Africa.csv")


q1 <- africa_table %>% 
  select(Area, Item.Code, Item, Element, Unit, Y2018) %>%
  filter(Element == "Production") %>%
  filter(Item.Code == 656) %>%
  arrange(desc(Y2018))

#Output as a csv vile
write.csv(q1africa_table, "C:\\Users\\WillPortFolio\\Desktop\\Blog\\Project 05 R and Apache Superset\\q1.csv")

OUTPUT:

Area Item.Code Item Element Unit Y2018
Ethiopia 656 Coffee, green Production tonnes 494574
Uganda 656 Coffee, green Production tonnes 284225
Madagascar 656 Coffee, green Production tonnes 60114
United Republic of Tanzania 656 Coffee, green Production tonnes 43193
Guinea 656 Coffee, green Production tonnes 42900
Kenya 656 Coffee, green Production tonnes 41375
C�te d'Ivoire 656 Coffee, green Production tonnes 39092
Rwanda 656 Coffee, green Production tonnes 38643
Democratic Republic of the Congo 656 Coffee, green Production tonnes 31145
Cameroon 656 Coffee, green Production tonnes 30590
Sierra Leone 656 Coffee, green Production tonnes 20480
Togo 656 Coffee, green Production tonnes 19076
Angola 656 Coffee, green Production tonnes 16308
Burundi 656 Coffee, green Production tonnes 14216
Malawi 656 Coffee, green Production tonnes 11082

2 Which country in Africa produced the most coffee per acre in 2018?

This is a similar problem, there is no need to aggregate the data because the data is already aggregated, it is a matter of not selecting "Production" as the "Element", but filtering by "Yield".

library(dplyr)
africa_table = read.csv("C:\\Users\\WillPortFolio\\Desktop\\Blog\\Project 05 R and Apache Superset\\Data\\agriculture-crop-production\\Production_Crops_E_Africa.csv")


#Only keep Element values equal to 'Yield'
q2 <- filter(africa_table, Element == 'Yield') %>%
  #Our focus is on coffee
  filter(Item =='Coffee, green') %>%
  #Select output columns
  select('Area', 'Item', 'Unit', 'Y2018') %>%
  #Sort highest yield to lowest
  arrange(desc(Y2018))

OUTPUT:


Area Item Unit Y2018
1 Malawi Coffee, green hg/ha 23345
2 Sierra Leone Coffee, green hg/ha 20706
3 Ghana Coffee, green hg/ha 16959
4 Rwanda Coffee, green hg/ha 16506
5 Nigeria Coffee, green hg/ha 12886
6 Burundi Coffee, green hg/ha 9658

3 Which country in Africa has the largest increase in the amount of coffee produced since 1961?

A different type of question. I need to compare values, and use the percent change formula.

#consider dplyr and africa_table already imported

#Select required columns
q3 <- select(africa_table, 'Area', 'Item', 'Unit', 'Element','Y1961','Y2018') %>%
  #filter by coffee and production
  filter(Item == 'Coffee, green', Element == 'Production') %>%
  #create a new column for percent change rounded to 2 decimal places
  mutate(PercentChange = round(((Y2018 - Y1961) / Y1961)*100,2)) %>%
  #arrange in descending order using PercentChange
  arrange(desc(PercentChange))  

OUTPUT:


Area Item Unit Element Y1961 Y2018 PercentChange
1 Malawi Coffee, green tonnes Production 167 11082 6535.93
2 Sierra Leone Coffee, green tonnes Production 5103 20480 301.33
3 Rwanda Coffee, green tonnes Production 10500 38643 268.03
4 Congo Coffee, green tonnes Production 900 3049 238.78
5 Uganda Coffee, green tonnes Production 94100 284225 202.05
6 Guinea Coffee, green tonnes Production 15000 42900 186
7 Togo Coffee, green tonnes Production 10300 19076 85.2
8 Kenya Coffee, green tonnes Production 28100 41375 47.24
9 Comoros Coffee, green tonnes Production 100 140 40
10 United Republic of Tanzania Coffee, green tonnes Production 33000 43193 30.89
11 Central African Republic Coffee, green tonnes Production 8200 9391 14.52
12 Madagascar Coffee, green tonnes Production 54000 60114 11.32
13 Nigeria Coffee, green tonnes Production 1100 1161 5.55
14 Burundi Coffee, green tonnes Production 14000 14216 1.54
15 Mozambique Coffee, green tonnes Production 1000 825 -17.5
16 Cameroon Coffee, green tonnes Production 44700 30590 -31.57

We will continue with questions 4 and 5 in the next post.