SQL EDA - AFDB Market Trends - Part 2

I will continue exploring a dataset found on data.world that was originally found on The Humanitarian Data Exchange. This dataset contains numerous market indicators of importance to the African Development Bank, along with dates and values for different indicators. Below is a snippet of the first few rows of data.
Indicator | IndicatorName | Unit | Frequency | Date | Value |
91242809 | Egypt CASE 30 Index | D | 2011-01-02 | 7082.4 | |
91242109 | Tunisia Dinar | D | 2011-01-03 | 1.4416 | |
91242209 | Platinum (USD/Troy Ounce) | USD/Troy Ounce | D | 2011-01-03 | 1781.5 |
91242309 | Kenya Nairobi SE Index- NSE 20 | D | 2011-01-03 | 4495.41 |
From this data I became curious to explore 7 different things:
- How many unique indicators are there?
- Is the number of unique indicators equal to the unique indicatorname field? If not explain the discrepancy.
- Remove all commodities from this table and put them in a separate table.
- From the remaining indicators, are there any gaps in reporting?
- Calculate the 'PercentChangeDaily' of indicators.
- Calculate the 'AverageMonthly' value of different indicators.
- What were the indicators that increased the most percentage OVERALL from the inception of this data.
I have explored the first four tasks in another post. I will finish the final three tasks of this project now.
Our fist task will be to create a new field for our table. This will be a comparison to previous dates, and will be on a per indicator basis, meaning the data must be partitioned, sorted, and lagged.
- Create a new field for "PercentChangeDaily"
SELECT
indicatorname,
date,
value,
yesterday,
ROUND(((yesterday - value) / value * 100), 5) AS percentchange
FROM(
SELECT
*,
LAG(value) OVER (PARTITION BY
indicatorname
ORDER BY
indicatorname,
date) AS yesterday
FROM
afdbmarkettrends2015
ORDER BY
indicatorname,
date) AS yesterdaystable
OUTPUT:
indicatorname | date | value | yesterday | percentchange |
Baltic Dry Index | 2011-01-04 | 1693 | NULL | NULL |
Baltic Dry Index | 2011-01-05 | 1621 | 1693 | 4.4417 |
Baltic Dry Index | 2011-01-06 | 1544 | 1621 | 4.98705 |
Baltic Dry Index | 2011-01-07 | 1519 | 1544 | 1.64582 |
Baltic Dry Index | 2011-01-10 | 1495 | 1519 | 1.60535 |
Baltic Dry Index | 2011-01-11 | 1480 | 1495 | 1.01351 |
Baltic Dry Index | 2011-01-12 | 1453 | 1480 | 1.85822 |
Baltic Dry Index | 2011-01-13 | 1438 | 1453 | 1.04312 |
Baltic Dry Index | 2011-01-14 | 1439 | 1438 | -0.06949 |
Baltic Dry Index | 2011-01-17 | 1439 | 1439 | 0 |
Baltic Dry Index | 2011-01-18 | 1432 | 1439 | 0.48883 |
This output creates a new column with yesterdays values, and uses that new column to calculate the percent change from the previous date on a per indicatorname basis.
The next task is to create an average monthly value field. To do this I will extract the month and year from the date column. And i will then aggregate the values on the basis of the newly created year and month columns.
- Calculate the 'AverageMonthly' value of different indicators.
SELECT
indicatorname,
year,
month,
AVG(value)
FROM
(SELECT
*,
EXTRACT(month FROM date) AS month,
EXTRACT(year FROM date) AS year
FROM afdbmarkettrends2015) AS trendswithmonthyear
GROUP BY
indicatorname,
year,
month
ORDER BY
indicatorname,
year,
month
OUTPUT:
indicatorname | year | month | monthlyaveragevalue |
Baltic Dry Index | 2011 | 1 | 1401.4 |
Baltic Dry Index | 2011 | 2 | 1181.1 |
Baltic Dry Index | 2011 | 3 | 1492.6957 |
Baltic Dry Index | 2011 | 4 | 1342.5556 |
Baltic Dry Index | 2011 | 5 | 1352.4 |
Baltic Dry Index | 2011 | 6 | 1433.2273 |
Baltic Dry Index | 2011 | 7 | 1365.5238 |
Baltic Dry Index | 2011 | 8 | 1386.9545 |
Baltic Dry Index | 2011 | 9 | 1840.4091 |
Baltic Dry Index | 2011 | 10 | 2072.4762 |
Baltic Dry Index | 2011 | 11 | 1835.3182 |
The final problem is not as straightforward as the other examples. We will be finding which indicators increased the most percentage overall since the inception of the data. It has to be noted that not all indicators have the same start date.
On a per indicator basis we must calculate
- Earliest date record of indicator as beginningvalue
- Latest date record of indicator as endingvalue
- Percent change of the new beginningvalue and endingvalue as percentchange
This will be done with the lag value and filtering where the beginningvalue and endingvalue match the dates of the indicator on a per indicator basis.
7 . What were the indicators that increased the most percentage OVERALL from the inception of the data?
SELECT
indicatorname,
date,
beginningvalue,
endingvalue,
ROUND(percentchange, 4) AS percentchange
FROM
/* 2 At this point we will have two records for each indicator
one with the beginningvalue matched to the date, and one with
the ending value matched to the date. We will use lag to
create a record for each indicator with the beginningvalue, and
the endingvalue, and then we use this to calculate percentage*/
(SELECT
indicatorname,
date,
LAG (value) OVER (PARTITION BY
(indicatorname)
ORDER BY
date) AS beginningvalue,
value AS endingvalue,
(value - LAG (value) OVER (PARTITION BY
(indicatorname)
ORDER BY
date))
/ LAG (value) OVER (PARTITION BY
(indicatorname)
ORDER BY
date) * 100 AS percentchange
FROM
--1 Create a mindate and maxdate for each indicator
(SELECT *,
MIN(date) OVER (PARTITION BY
indicatorname) AS mindate,
MAX(date) OVER (PARTITION BY
indicatorname) AS maxdate
FROM
afdbmarkettrends2015
ORDER BY
date)
AS trendswithminandmax
/*3 Filter for each row using the where clause,
keeps only records that are the beginning and
endind dates for each record*/
WHERE
date = mindate OR
date = maxdate
ORDER BY
indicatorname,
date)
AS trendswithchangepercent
WHERE
/*4 The lag creates a null value that helps
to remove the unnecessary record, we will filter
to remove these records*/
beginningvalue IS NOT NULL
ORDER BY
percentchange DESC
LIMIT
5
OUTPUT:
indicatorname | date | beginningvalue | endingvalue | percentchange |
South Africa Rand | 2015-07-28 | 6.6275 | 12.5681 | 89.6356 |
Cote d'Ivoire BRVM Composite Index | 2015-07-28 | 159.32 | 301.22 | 89.066 |
Uganda SE All Share index USE | 2015-07-28 | 1192.57 | 1893.63 | 58.7856 |
South Africa JALSH All Share Index | 2015-07-28 | 32308.11 | 50758.42 | 57.1074 |
Uganda Shilling | 2015-07-28 | 2310 | 3420 | 48.0519 |
There we have our final SQL analysis of this dataset.
Key takeaways:
With SQL it is easy to get the data to report what you want it to. Some of the positives are that it is quick to query, and the language is quite easy to understand for smaller queries. Conversely, while trying to accomplish more robust aggregations, the language becomes harder to follow. Also it is not a 'one stop shop' for analyzing and reporting visually. Tools like Python or R would be useful if visualizations and the same aggregations could be done.
I will be exploring this data visually with Tableau in this post.