SQL EDA - AFDB Market Trends - Part 1

SQL EDA - AFDB Market Trends - Part 1
Photo by Iwaria Inc. / Unsplash

I will be exploring a dataset found on data.world that was originally found on The Humanitarian Data Exchange. This dataset contains numerous market indicators of importance to the African Development Bank, along with dates and values for different indicators. Below is a snippet of the first few rows of data.

Indicator IndicatorName Unit Frequency Date Value
91242809 Egypt CASE 30 Index
D 2011-01-02 7082.4
91242109 Tunisia Dinar
D 2011-01-03 1.4416
91242209 Platinum (USD/Troy Ounce) USD/Troy Ounce D 2011-01-03 1781.5
91242309 Kenya Nairobi SE Index- NSE 20
D 2011-01-03 4495.41

From this data I became curious to explore 7 different things:

  1. How many unique indicators are there?
  2. Is the number of unique indicators equal to the unique indicatorname field? If not explain the discrepancy.
  3. Remove all commodities from this table and put them in a separate table.
  4. From the remaining indicators, are there any gaps in reporting?
  5. Calculate the 'PercentChangeDaily' of indicators.*
  6. Calculate the 'AverageMonthly' value of different indicators.*
  7. What were the indicators that increased the most percentage OVERALL from the inception of this data.*

*The last 3 items will be covered in the next post.

For a more condensed version, view on github.


Let's start with the first question.

  1. How many unique indicators are there?
SELECT 
	COUNT(DISTINCT(indicator))
FROM 
	afdbmarkettrends2015

OUTPUT:

indicator_count
31

Problem 1 is complete. I've answered the question, but let's answer some questions that may be implied. I will output the 31 indicators along with the indicator names. We will sort the list alphabetically by the indicatorname.

SELECT 
	DISTINCT(indicator), 
	indicatorname
FROM 
	afdbmarkettrends2015
ORDER BY 
	indicatorname
indicator indicatorname
91244509 Baltic Dry Index
91242409 CFA zone Countries CFA Franc
91241909 Cocoa (USD/tonne)
91244109 Coffee Brazilian Naturals (US cents/tonne)
91242709 Coffee Robusta (US cents/tonne)
91243909 Copper (USD/lb)
91244709 Copper (USD/MT)
91243409 Cote d'Ivoire BRVM Composite Index
91243709 Cotton (USD/lb)
91244809 Crude Oil, Brent (USD/bbl)
91242809 Egypt CASE 30 Index
91243809 Egypt Pound
91243609 Europe EURO
91244909 Gold (USD/Troy Ounce)
91244609 Iron Ore (USD/Dry MT)
91242609 Kenya Kenyan Shilling
91242309 Kenya Nairobi SE Index- NSE 20
91242909 Mauritius Mauritius AllShares SEMDEX
91244009 Mauritius Rupee
91244309 Morocco Casa All Share Index
91243109 Morocco Dirham
91243009 Nigeria Naira
91242009 Nigeria NGSE All Share Index
91242209 Platinum (USD/Troy Ounce)
91245009 Silver (USD/Troy Ounce)
91242509 South Africa JALSH All Share Index
91243209 South Africa Rand
91242109 Tunisia Dinar
91243309 Tunisia Tunis se Tnse Index TUNINDEX
91243509 Uganda SE All Share index USE
91244209 Uganda Shilling

The second question outlines a 'sanity check' for the data. We will be summing the count of unique values in the indicator field, and the count of unique values in the indicatorname field, to make sure they are the same.

  1. Is the number of unique indicators equal to the unique inidcatorname field? If not explain the discrepancy.
SELECT 
	COUNT(DISTINCT(indicator)) AS indicator_count,
	COUNT(DISTINCT(indicatorname)) AS indicator_name_count
FROM 
	afdbmarkettrends2015
indicator_count indicator_name_count
31 31

The fields are equal, this aspect of the data is 'sane'.

Then we want to separate the commodities from this data. The unit column has a currency if and only if the item is a commodity. We will use this information to extract commodities, then we will use this data to delete the commodities after the data is transported to a new table.

  1. Remove all commodities from this table and put them in a separate table. Then we will count the number of items in the new table.
CREATE TABLE afdbmarketcommodities2015 AS
	SELECT 
		* 
	FROM 
		afdbmarkettrends2015
	WHERE 
		unit IS NOT NULL;

SELECT 
	COUNT (*) AS newcommoditiescount 
FROM 
	afdbmarketcommodities2015;

OUTPUT:

newcommoditiescount
11473

This makes the new table from the filtered data. The data was checked for accuracy, and we now need to remove the data from the afdbmarkettrends2015 table.

DELETE FROM 
	afdbmarkettrends2015 
WHERE 
	unit IS NOT NULL;

OUTPUT:

DELETE 11473 Query returned successfully in 11 msec.

Good.

When I saw the dates listed, I was genuinely curious if the days had any gaps. This will require some creative sorting by indicator and date, the creation of a new column that has the previous date that was on record. This sorting should be done separately for each indicator.

  1. From the remaining indicators, are there any gaps in reporting?
SELECT 
	indicatorname,
	date,
	previousday,
	CAST(date AS timestamp) - CAST(previousday AS timestamp)  AS timegap
FROM (
	SELECT 
		*,
		LAG(date) OVER(PARTITION BY (indicator) ORDER BY indicator, date) AS previousday
	FROM afdbmarkettrends2015
	ORDER BY 
		indicator, 
		date) 
	AS laggedtable

OUTPUT (small summary):

indicatorname date previousday timegap
Nigeria NGSE All Share Index 2011-01-04 NULL NULL
Nigeria NGSE All Share Index 2011-01-05 2011-01-04 1 day
Nigeria NGSE All Share Index 2011-01-06 2011-01-05 1 day
Nigeria NGSE All Share Index 2011-01-07 2011-01-06 1 day
Nigeria NGSE All Share Index 2011-01-10 2011-01-07 3 days

Of interest in the same output is that the counting starts over when a new indicator is listed.

indicatorname date previousday timegap
Nigeria NGSE All Share Index 2015-07-27 2015-07-24 3 days
Nigeria NGSE All Share Index 2015-07-28 2015-07-27 1 day
Tunisia Dinar 2011-01-03 NULL NULL
Tunisia Dinar 2011-01-04 2011-01-03 1 day
Tunisia Dinar 2011-01-05 2011-01-04 1 day
Tunisia Dinar 2011-01-06 2011-01-05 1 day
Tunisia Dinar 2011-01-07 2011-01-06 1 day
Tunisia Dinar 2011-01-10 2011-01-07 3 days

Good

Let's look at another

indicatorname date previousday timegap
Tunisia Dinar 2015-07-27 2015-07-24 3 days
Tunisia Dinar 2015-07-28 2015-07-27 1 day
Kenya Nairobi SE Index- NSE 20 2011-01-03 NULL NULL
Kenya Nairobi SE Index- NSE 20 2011-01-04 2011-01-03 1 day
Kenya Nairobi SE Index- NSE 20 2011-01-05 2011-01-04 1 day
Kenya Nairobi SE Index- NSE 20 2011-01-06 2011-01-05 1 day
Kenya Nairobi SE Index- NSE 20 2011-01-07 2011-01-06 1 day
Kenya Nairobi SE Index- NSE 20 2011-01-10 2011-01-07 3 days

Still Good

We will continue with tasks 5-7 in another post.

I will also be exploring this data visually with Tableau in this post if you wanted to skip ahead.