SQL EDA - AFDB Market Trends - Part 2

SQL EDA - AFDB Market Trends - Part 2
Computer screen showing a dashboard with charts and aggregate data

I will continue exploring a dataset found on data.world that was originally found on The Humanitarian Data Exchange. This dataset contains numerous market indicators of importance to the African Development Bank, along with dates and values for different indicators. Below is a snippet of the first few rows of data.

Indicator IndicatorName Unit Frequency Date Value
91242809 Egypt CASE 30 Index
D 2011-01-02 7082.4
91242109 Tunisia Dinar
D 2011-01-03 1.4416
91242209 Platinum (USD/Troy Ounce) USD/Troy Ounce D 2011-01-03 1781.5
91242309 Kenya Nairobi SE Index- NSE 20
D 2011-01-03 4495.41

From this data I became curious to explore 7 different things:

  1. How many unique indicators are there?
  2. Is the number of unique indicators equal to the unique indicatorname field? If not explain the discrepancy.
  3. Remove all commodities from this table and put them in a separate table.
  4. From the remaining indicators, are there any gaps in reporting?
  5. Calculate the 'PercentChangeDaily' of indicators.
  6. Calculate the 'AverageMonthly' value of different indicators.
  7. What were the indicators that increased the most percentage OVERALL from the inception of this data.

I have explored the first four tasks in another post. I will finish the final three tasks of this project now.

Our fist task will be to create a new field for our table. This will be a comparison to previous dates, and will be on a per indicator basis, meaning the data must be partitioned, sorted, and lagged.

  1. Create a new field for "PercentChangeDaily"
SELECT 
	indicatorname, 
    date, 
    value, 
    yesterday,
	ROUND(((yesterday - value) / value * 100), 5) AS percentchange
FROM(
	SELECT 
		*,
		LAG(value) OVER (PARTITION BY 
						 indicatorname 
    						 ORDER BY 
    						 	indicatorname, 
    						 	date) AS yesterday
	FROM 
		afdbmarkettrends2015
ORDER BY 
	indicatorname, 
	date) AS yesterdaystable

OUTPUT:

indicatorname date value yesterday percentchange
Baltic Dry Index 2011-01-04 1693 NULL NULL
Baltic Dry Index 2011-01-05 1621 1693 4.4417
Baltic Dry Index 2011-01-06 1544 1621 4.98705
Baltic Dry Index 2011-01-07 1519 1544 1.64582
Baltic Dry Index 2011-01-10 1495 1519 1.60535
Baltic Dry Index 2011-01-11 1480 1495 1.01351
Baltic Dry Index 2011-01-12 1453 1480 1.85822
Baltic Dry Index 2011-01-13 1438 1453 1.04312
Baltic Dry Index 2011-01-14 1439 1438 -0.06949
Baltic Dry Index 2011-01-17 1439 1439 0
Baltic Dry Index 2011-01-18 1432 1439 0.48883

This output creates a new column with yesterdays values, and uses that new column to calculate the percent change from the previous date on a per indicatorname basis.

The next task is to create an average monthly value field. To do this I will extract the month and year from the date column. And i will then aggregate the values on the basis of the newly created year and month columns.

  1. Calculate the 'AverageMonthly' value of different indicators.
SELECT 
	indicatorname, 
	year, 
	month, 
	AVG(value)
FROM
	(SELECT 
		*, 
		EXTRACT(month FROM date) AS month, 
		EXTRACT(year FROM date) AS year
	FROM afdbmarkettrends2015) AS trendswithmonthyear
GROUP BY 
	indicatorname, 
	year, 
	month
ORDER BY 
	indicatorname, 
	year, 
	month

OUTPUT:

indicatorname year month monthlyaveragevalue
Baltic Dry Index 2011 1 1401.4
Baltic Dry Index 2011 2 1181.1
Baltic Dry Index 2011 3 1492.6957
Baltic Dry Index 2011 4 1342.5556
Baltic Dry Index 2011 5 1352.4
Baltic Dry Index 2011 6 1433.2273
Baltic Dry Index 2011 7 1365.5238
Baltic Dry Index 2011 8 1386.9545
Baltic Dry Index 2011 9 1840.4091
Baltic Dry Index 2011 10 2072.4762
Baltic Dry Index 2011 11 1835.3182

The final problem is not as straightforward as the other examples. We will be finding which indicators increased the most percentage overall since the inception of the data. It has to be noted that not all indicators have the same start date.

On a per indicator basis we must calculate

  • Earliest date record of indicator as beginningvalue
  • Latest date record of indicator as endingvalue
  • Percent change of the new beginningvalue and endingvalue as percentchange

This will be done with the lag value and filtering where the beginningvalue and endingvalue match the dates of the indicator on a per indicator basis.

7 . What were the indicators that increased the most percentage OVERALL from the inception of the data?

SELECT 
	indicatorname, 
	date, 
	beginningvalue, 
	endingvalue, 
	ROUND(percentchange, 4) AS percentchange
FROM
	/* 2 At this point we will have two records for each indicator
	one with the beginningvalue matched to the date, and one with
	the ending value matched to the date.  We will use lag to
	create a record for each indicator with the beginningvalue, and 
	the endingvalue, and then we use this to calculate percentage*/
	
	(SELECT 
	 	indicatorname, 
	 	date, 
	 	LAG (value) OVER (PARTITION BY 
						  	(indicatorname) 
						ORDER BY 
						  	date) AS beginningvalue, 
	 	value AS endingvalue,
		(value - LAG (value) OVER (PARTITION BY 
							(indicatorname) 
						ORDER BY 
    						date)) 
	 		/ LAG (value) OVER (PARTITION BY 
							(indicatorname) 
						ORDER BY 
							date) * 100 AS percentchange 
		FROM
			--1  Create a mindate and maxdate for each indicator
	 		(SELECT *, 
			 	MIN(date) OVER (PARTITION BY 
							indicatorname) AS mindate, 
			 	MAX(date) OVER (PARTITION BY 
    						indicatorname) AS maxdate
			FROM 
			 	afdbmarkettrends2015
		ORDER BY 
			 date) 
	 	AS trendswithminandmax
        
		/*3 Filter for each row using the where clause,
	 	keeps only records that are the beginning and 
	 	endind dates for each record*/
	 	WHERE 
	 		date = mindate OR 
	 		date = maxdate
	ORDER BY 
	 	indicatorname, 
	 	date) 
	AS trendswithchangepercent
WHERE
	/*4 The lag creates a null value  that helps
	to remove the unnecessary record, we will filter
	to remove these records*/
	beginningvalue IS NOT NULL
ORDER BY 
	percentchange DESC
LIMIT 
	5

OUTPUT:

indicatorname date beginningvalue endingvalue percentchange
South Africa Rand 2015-07-28 6.6275 12.5681 89.6356
Cote d'Ivoire BRVM Composite Index 2015-07-28 159.32 301.22 89.066
Uganda SE All Share index USE 2015-07-28 1192.57 1893.63 58.7856
South Africa JALSH All Share Index 2015-07-28 32308.11 50758.42 57.1074
Uganda Shilling 2015-07-28 2310 3420 48.0519

There we have our final SQL analysis of this dataset.


Key takeaways:

With SQL it is easy to get the data to report what you want it to. Some of the positives are that it is quick to query, and the language is quite easy to understand for smaller queries. Conversely, while trying to accomplish more robust aggregations, the language becomes harder to follow. Also it is not a 'one stop shop' for analyzing and reporting visually. Tools like Python or R would be useful if visualizations and the same aggregations could be done.

I will be exploring this data visually with Tableau in this post.