By William M — Sep 3, 2024

R - Agriculture Crop Production EDA - Part 1

I found a dataset on data.world that had a great deal of information on country crop yields. It contained 5 data tables divided into different groupings including Africa, The Americas, Asia, Europe, and Oceania. It also contained 3 more tables that helped explain the dataset and offer context about the data collection, measurements, etc.

Lets preview the data concerning Africa.

Area Code	Area	Item Code	Item	Element Code	Element	Unit	Y1961
4	Algeria	221	Almonds, with shell	5312	Area harvested	ha	13300
4	Algeria	221	Almonds, with shell	5419	Yield	hg/ha	4511
4	Algeria	221	Almonds, with shell	5510	Production	tonnes	6000
4	Algeria	515	Apples	5312	Area harvested	ha	3400
4	Algeria	515	Apples	5419	Yield	hg/ha	45294
4	Algeria	515	Apples	5510	Production	tonnes	15400

Each agricultural product in each country has three entries. The 'Element' column describes the measure. Three elements are included 'Area harvested', 'Yield', and 'Production'. 'Yield' is an aggregate of the 'Area harvested', and 'Production'.

The data for the years is ordered by column as well.

Y1961F	Y1962	Y1962F	Y1963	Y1963F	Y1964	Y1964F	Y1965
F	13300	F	13300	F	14200	F	13800
Fc	4511	Fc	4511	Fc	4507	Fc	4493
	6000		6000		6400		6200
F	3100	F	2800	F	2700	F	2900
Fc	45161	Fc	46429	Fc	46078	Fc	45348
	14000		13000		12441		13151

This continues until 2018. Notice how there is a date with an 'F' at the end of it? This allows us to know how the data was collected. This data can be found in the 'Flags.csv' file included in the dataset. I include it for those are curious, but it was not used in this data exploration. I'll use an ellipse to denote where I snipped the dataset.

Flag	Flags
	Official data
*	Unofficial figure
A	Aggregate, may include official, semi-official, estimated or calculated data
…	…
F	FAO estimate
Fb	Data obtained as a balance
Fc	Calculated data
…	…

TO THE ANALYSIS

We will be answering 5 questions on this data.

Which country in Africa produced the most coffee in 2018?
Which country in Africa produced the most coffee per acre in 2018?
Which country in Africa has the largest increase in the amount of coffee produced since 1961?
Which country (in the world, not specifically Africa) saw the highest increase in the amount of coffee grown per acre since 1961?*
Which continent (or rather, major grouping as presented in the 5 data sets) has the largest increase in the amount of coffee grown per acre since 1961?*

*This post contains solutions for 1-3, the next post details 4 and 5

1 Which country in Africa produced the most coffee in 2018?

We are going to start by importing the data. Then we distill the columns that we to display by subsetting the dataset and including only 'Area', 'Item', 'Element', 'Unit', and the year in question 'Y2018'.

The idea here is simple, we are going to filter where the 'Item Code' is equal to '656', and filter on the 'Element' column where it is equal to 'Production'. After this it is a matter of sorting data on the 'Y2018' column, as this column contains the numerical 'Production' values.

#Import dplyr
library(dplyr)

#Import data
africa_table = read.csv("C:\\Users\\WillPortFolio\\Desktop\\Blog\\Project 05 R and Apache Superset\\Data\\agriculture-crop-production\\Production_Crops_E_Africa.csv")


q1 <- africa_table %>% 
  select(Area, Item.Code, Item, Element, Unit, Y2018) %>%
  filter(Element == "Production") %>%
  filter(Item.Code == 656) %>%
  arrange(desc(Y2018))

#Output as a csv vile
write.csv(q1africa_table, "C:\\Users\\WillPortFolio\\Desktop\\Blog\\Project 05 R and Apache Superset\\q1.csv")

OUTPUT:

Area	Item.Code	Item	Element	Unit	Y2018
Ethiopia	656	Coffee, green	Production	tonnes	494574
Uganda	656	Coffee, green	Production	tonnes	284225
Madagascar	656	Coffee, green	Production	tonnes	60114
United Republic of Tanzania	656	Coffee, green	Production	tonnes	43193
Guinea	656	Coffee, green	Production	tonnes	42900
Kenya	656	Coffee, green	Production	tonnes	41375
C�te d'Ivoire	656	Coffee, green	Production	tonnes	39092
Rwanda	656	Coffee, green	Production	tonnes	38643
Democratic Republic of the Congo	656	Coffee, green	Production	tonnes	31145
Cameroon	656	Coffee, green	Production	tonnes	30590
Sierra Leone	656	Coffee, green	Production	tonnes	20480
Togo	656	Coffee, green	Production	tonnes	19076
Angola	656	Coffee, green	Production	tonnes	16308
Burundi	656	Coffee, green	Production	tonnes	14216
Malawi	656	Coffee, green	Production	tonnes	11082

2 Which country in Africa produced the most coffee per acre in 2018?

This is a similar problem, there is no need to aggregate the data because the data is already aggregated, it is a matter of not selecting "Production" as the "Element", but filtering by "Yield".

library(dplyr)
africa_table = read.csv("C:\\Users\\WillPortFolio\\Desktop\\Blog\\Project 05 R and Apache Superset\\Data\\agriculture-crop-production\\Production_Crops_E_Africa.csv")


#Only keep Element values equal to 'Yield'
q2 <- filter(africa_table, Element == 'Yield') %>%
  #Our focus is on coffee
  filter(Item =='Coffee, green') %>%
  #Select output columns
  select('Area', 'Item', 'Unit', 'Y2018') %>%
  #Sort highest yield to lowest
  arrange(desc(Y2018))

OUTPUT:

	Area	Item	Unit	Y2018
1	Malawi	Coffee, green	hg/ha	23345
2	Sierra Leone	Coffee, green	hg/ha	20706
3	Ghana	Coffee, green	hg/ha	16959
4	Rwanda	Coffee, green	hg/ha	16506
5	Nigeria	Coffee, green	hg/ha	12886
6	Burundi	Coffee, green	hg/ha	9658

3 Which country in Africa has the largest increase in the amount of coffee produced since 1961?

A different type of question. I need to compare values, and use the percent change formula.

#consider dplyr and africa_table already imported

#Select required columns
q3 <- select(africa_table, 'Area', 'Item', 'Unit', 'Element','Y1961','Y2018') %>%
  #filter by coffee and production
  filter(Item == 'Coffee, green', Element == 'Production') %>%
  #create a new column for percent change rounded to 2 decimal places
  mutate(PercentChange = round(((Y2018 - Y1961) / Y1961)*100,2)) %>%
  #arrange in descending order using PercentChange
  arrange(desc(PercentChange))

OUTPUT:

	Area	Item	Unit	Element	Y1961	Y2018	PercentChange
1	Malawi	Coffee, green	tonnes	Production	167	11082	6535.93
2	Sierra Leone	Coffee, green	tonnes	Production	5103	20480	301.33
3	Rwanda	Coffee, green	tonnes	Production	10500	38643	268.03
4	Congo	Coffee, green	tonnes	Production	900	3049	238.78
5	Uganda	Coffee, green	tonnes	Production	94100	284225	202.05
6	Guinea	Coffee, green	tonnes	Production	15000	42900	186
7	Togo	Coffee, green	tonnes	Production	10300	19076	85.2
8	Kenya	Coffee, green	tonnes	Production	28100	41375	47.24
9	Comoros	Coffee, green	tonnes	Production	100	140	40
10	United Republic of Tanzania	Coffee, green	tonnes	Production	33000	43193	30.89
11	Central African Republic	Coffee, green	tonnes	Production	8200	9391	14.52
12	Madagascar	Coffee, green	tonnes	Production	54000	60114	11.32
13	Nigeria	Coffee, green	tonnes	Production	1100	1161	5.55
14	Burundi	Coffee, green	tonnes	Production	14000	14216	1.54
15	Mozambique	Coffee, green	tonnes	Production	1000	825	-17.5
16	Cameroon	Coffee, green	tonnes	Production	44700	30590	-31.57

We will continue with questions 4 and 5 in the next post.

R - Agriculture Crop Production EDA - Part 1

1 Which country in Africa produced the most coffee in 2018?

2 Which country in Africa produced the most coffee per acre in 2018?

3 Which country in Africa has the largest increase in the amount of coffee produced since 1961?

Power Bi - TFL Bus Safety Viz

R - Agriculture Crop Production EDA - Part 2