SQL EDA - AFDB Market Trends - Part 1
I will be exploring a dataset found on data.world that was originally found on The Humanitarian Data Exchange. This dataset contains numerous market indicators of importance to the African Development Bank, along with dates and values for different indicators. Below is a snippet of the first few rows of data.
Indicator | IndicatorName | Unit | Frequency | Date | Value |
91242809 | Egypt CASE 30 Index | D | 2011-01-02 | 7082.4 | |
91242109 | Tunisia Dinar | D | 2011-01-03 | 1.4416 | |
91242209 | Platinum (USD/Troy Ounce) | USD/Troy Ounce | D | 2011-01-03 | 1781.5 |
91242309 | Kenya Nairobi SE Index- NSE 20 | D | 2011-01-03 | 4495.41 |
From this data I became curious to explore 7 different things:
- How many unique indicators are there?
- Is the number of unique indicators equal to the unique indicatorname field? If not explain the discrepancy.
- Remove all commodities from this table and put them in a separate table.
- From the remaining indicators, are there any gaps in reporting?
- Calculate the 'PercentChangeDaily' of indicators.*
- Calculate the 'AverageMonthly' value of different indicators.*
- What were the indicators that increased the most percentage OVERALL from the inception of this data.*
*The last 3 items will be covered in the next post.
For a more condensed version, view on github.
Let's start with the first question.
- How many unique indicators are there?
SELECT
COUNT(DISTINCT(indicator))
FROM
afdbmarkettrends2015
OUTPUT:
indicator_count |
31 |
Problem 1 is complete. I've answered the question, but let's answer some questions that may be implied. I will output the 31 indicators along with the indicator names. We will sort the list alphabetically by the indicatorname.
SELECT
DISTINCT(indicator),
indicatorname
FROM
afdbmarkettrends2015
ORDER BY
indicatorname
indicator | indicatorname |
91244509 | Baltic Dry Index |
91242409 | CFA zone Countries CFA Franc |
91241909 | Cocoa (USD/tonne) |
91244109 | Coffee Brazilian Naturals (US cents/tonne) |
91242709 | Coffee Robusta (US cents/tonne) |
91243909 | Copper (USD/lb) |
91244709 | Copper (USD/MT) |
91243409 | Cote d'Ivoire BRVM Composite Index |
91243709 | Cotton (USD/lb) |
91244809 | Crude Oil, Brent (USD/bbl) |
91242809 | Egypt CASE 30 Index |
91243809 | Egypt Pound |
91243609 | Europe EURO |
91244909 | Gold (USD/Troy Ounce) |
91244609 | Iron Ore (USD/Dry MT) |
91242609 | Kenya Kenyan Shilling |
91242309 | Kenya Nairobi SE Index- NSE 20 |
91242909 | Mauritius Mauritius AllShares SEMDEX |
91244009 | Mauritius Rupee |
91244309 | Morocco Casa All Share Index |
91243109 | Morocco Dirham |
91243009 | Nigeria Naira |
91242009 | Nigeria NGSE All Share Index |
91242209 | Platinum (USD/Troy Ounce) |
91245009 | Silver (USD/Troy Ounce) |
91242509 | South Africa JALSH All Share Index |
91243209 | South Africa Rand |
91242109 | Tunisia Dinar |
91243309 | Tunisia Tunis se Tnse Index TUNINDEX |
91243509 | Uganda SE All Share index USE |
91244209 | Uganda Shilling |
The second question outlines a 'sanity check' for the data. We will be summing the count of unique values in the indicator field, and the count of unique values in the indicatorname field, to make sure they are the same.
- Is the number of unique indicators equal to the unique inidcatorname field? If not explain the discrepancy.
SELECT
COUNT(DISTINCT(indicator)) AS indicator_count,
COUNT(DISTINCT(indicatorname)) AS indicator_name_count
FROM
afdbmarkettrends2015
indicator_count | indicator_name_count |
31 | 31 |
The fields are equal, this aspect of the data is 'sane'.
Then we want to separate the commodities from this data. The unit column has a currency if and only if the item is a commodity. We will use this information to extract commodities, then we will use this data to delete the commodities after the data is transported to a new table.
- Remove all commodities from this table and put them in a separate table. Then we will count the number of items in the new table.
CREATE TABLE afdbmarketcommodities2015 AS
SELECT
*
FROM
afdbmarkettrends2015
WHERE
unit IS NOT NULL;
SELECT
COUNT (*) AS newcommoditiescount
FROM
afdbmarketcommodities2015;
OUTPUT:
newcommoditiescount |
11473 |
This makes the new table from the filtered data. The data was checked for accuracy, and we now need to remove the data from the afdbmarkettrends2015 table.
DELETE FROM
afdbmarkettrends2015
WHERE
unit IS NOT NULL;
OUTPUT:
DELETE 11473 Query returned successfully in 11 msec.
Good.
When I saw the dates listed, I was genuinely curious if the days had any gaps. This will require some creative sorting by indicator and date, the creation of a new column that has the previous date that was on record. This sorting should be done separately for each indicator.
- From the remaining indicators, are there any gaps in reporting?
SELECT
indicatorname,
date,
previousday,
CAST(date AS timestamp) - CAST(previousday AS timestamp) AS timegap
FROM (
SELECT
*,
LAG(date) OVER(PARTITION BY (indicator) ORDER BY indicator, date) AS previousday
FROM afdbmarkettrends2015
ORDER BY
indicator,
date)
AS laggedtable
OUTPUT (small summary):
indicatorname | date | previousday | timegap |
Nigeria NGSE All Share Index | 2011-01-04 | NULL | NULL |
Nigeria NGSE All Share Index | 2011-01-05 | 2011-01-04 | 1 day |
Nigeria NGSE All Share Index | 2011-01-06 | 2011-01-05 | 1 day |
Nigeria NGSE All Share Index | 2011-01-07 | 2011-01-06 | 1 day |
Nigeria NGSE All Share Index | 2011-01-10 | 2011-01-07 | 3 days |
Of interest in the same output is that the counting starts over when a new indicator is listed.
indicatorname | date | previousday | timegap |
Nigeria NGSE All Share Index | 2015-07-27 | 2015-07-24 | 3 days |
Nigeria NGSE All Share Index | 2015-07-28 | 2015-07-27 | 1 day |
Tunisia Dinar | 2011-01-03 | NULL | NULL |
Tunisia Dinar | 2011-01-04 | 2011-01-03 | 1 day |
Tunisia Dinar | 2011-01-05 | 2011-01-04 | 1 day |
Tunisia Dinar | 2011-01-06 | 2011-01-05 | 1 day |
Tunisia Dinar | 2011-01-07 | 2011-01-06 | 1 day |
Tunisia Dinar | 2011-01-10 | 2011-01-07 | 3 days |
Good
Let's look at another
indicatorname | date | previousday | timegap |
Tunisia Dinar | 2015-07-27 | 2015-07-24 | 3 days |
Tunisia Dinar | 2015-07-28 | 2015-07-27 | 1 day |
Kenya Nairobi SE Index- NSE 20 | 2011-01-03 | NULL | NULL |
Kenya Nairobi SE Index- NSE 20 | 2011-01-04 | 2011-01-03 | 1 day |
Kenya Nairobi SE Index- NSE 20 | 2011-01-05 | 2011-01-04 | 1 day |
Kenya Nairobi SE Index- NSE 20 | 2011-01-06 | 2011-01-05 | 1 day |
Kenya Nairobi SE Index- NSE 20 | 2011-01-07 | 2011-01-06 | 1 day |
Kenya Nairobi SE Index- NSE 20 | 2011-01-10 | 2011-01-07 | 3 days |
Still Good
We will continue with tasks 5-7 in another post.
I will also be exploring this data visually with Tableau in this post if you wanted to skip ahead.