Developing an IoT Analytics System with MATLAB, Machine Learning and Carriots

This example shows how you can integrate Carriots with MATLAB. To illustrate the workflow we'll create a simple temperature forecasting system based on neural networks. This system will predict temperatures based on the data gathered by our own weather station in Madrid.

This data will be used as historical feed to train a neural network which will predict the temperature.

The work flow for developing the system consists of four steps:

Disclaimer: The example is not intended to be a complete and correct weather study but an example of integration and data analytics using Carriots and MATLAB.

Collect historical data

We'll introduce how to query Carriots for historical data streams to get some years of data.

This data has been gathered and stored on Carriots platform by a weather station installed on our rooftop. The station sample at a rate of one stream per hour, which counts for a total of more than 19000 streams as by the time of writing these lines.

We wrote a MATLAB function to query an specific set of streams from Carriots. This function receives, in addition to the project credentials, the starting point and the number of streams to get. It returns the frames on a MATLAB struct array.

A second function will iterate the streams collection on Carriots as, per API limitation, we can not download more than 1000 on a single request. This function is feeded with the project credentials and will page over the streams collection on Carriots. It returns the full frames collection on a MATLAB struct array.

Once gathered all the data it must be filtered to remove wrong, incomplete and out of range values. We found falling streams without humidity data and wind directions out of range (0, 369) a full range circumference.

  • Fields not present
  • Wind direction: bad streams
  • Wind direction: raw plot

The following function will check all fields are present and values are in range. Those streams are rejected and the data collection updated locally.

Now we can check the number of streams has been reduced and the wind direction data is in range.

  • Reduced streams
  • Wind direction: fixed plot

We also detected a few days without data, the lack of those streams will not be a real problem for the analysis as the neural training will not be date based and they represent a small percentage of data.

  • Lack of streams

At this point the data is ready to be processed.

Analyze the data

First step we take is plot the data in some different formats to visualize the evolution along the year.

First plot shows each variable evolution independently.

  • Full plot

We found a direct dependency between UV Rad, Solar Rad and Battery level. This relation seems to be obvious as the main source of UV and electrical power is the sun as the station is powered by a solar panel. We choose to left UV Rad and Battery level variables out of consideration and rely just on the Solar Rad for it's influence on the temperature.

So the variables we will take in consideration are:

  • Humidity, Wind direction, Rainfall, Pressure, Solar Radiation and Wind speed as input data we'll use to predict temperature.
  • We'll use Temperature data to train and test our neural network.

Next we plot a normalized overlay of the variables we are going to use to predict temperature to see if there are any direct correlations.

  • Normalized plot
  • Normalized plot zoom

Finally we plot a proportional view of the data, in this view vertical scale and offset of each variable is altered to make them fit in the graph. Its purpose is to evaluate its forms to compare signal evolution, correlations and, maybe, direct dependences easily.

  • Proportional plot
  • Proportional plot zoom

Temperature seems to have a more or less direct influence from Solar rad and Humidity. Pressure and Wind speed seems to be related to rainfall, which makes sense, but we'll stick to our initial target to predict temperatures.

Any more complex and non direct relationship between variables should be internally inferred by the neural network during the training step, so we do not waste more time analyzing the data. A more real and complaining weather prediction system should take many other relations and variables in consideration but, as mentioned before, that's not the target of this tutorial.

Build and train a neural network

At this point we are ready to set up the neural network.

The fist step is to prepare two sets of data:

  • The one we will use to feed the network and will be used to predict later: vector_neural_input (Humidity, Wind direction, Rainfall, Pressure, Solar Radiation and Wind speed).
  • The one with the data to be predicted and will be used to train the network: vector_neural_output (Temperature).

Second step is to set up a neural network to process our data:

A neural network is a computational model used in computer science which is based on a large collection of simple neural units. Each individual neural unit computes input data and output results to another unit until the "output" unit/s is reached.

  • Colored neural network

MATLAB suit includes a number of applications which makes easy to deploy and train a neural network.

For our problem we choose the "Neural network Time series Tool" which fits well for prediction problems over time series values, which base future values on past ones.

We choose to build a Nonlinear autoregressive exogenous (NARX) neural network. A NARX uses two input series of past data to predict future values of one of them:

  • NARX

For the input series, Input and Target, we'll use both vector we got prepared earlier, vector_neural_input and vector_neural_output. They both are on a "Matrix Column" configuration.

  • Select data

For the Test data split we will keep default values, that is: 70% of data used for training, 15% used for training validation and the las 15% used for final testing.

  • Validation

The network architecture will consist on 10 neurons on the hidden layer. We also set 24 steps to take into account a full day of previous data.

  • Architecture

Last step is to define the training algorithm to be used. We choose Bayesian Regularization training mode. Bayesian Regularization is a robust training algorithm and could give us good results.

  • Training mode

And start the training. It took around 30 minutes to finish on a mid range computer.

  • Training

Once finished we got the "open" network. An open network does not reuse auto predicted data to keep going once the target series gets empty.

  • Open Network

We can choose to close the network so the predicted values could be used as inputs and keep doing predictions while there are values for the input series.

  • Closed network

We can also make it an Step-Ahead Prediction Network, this way we'll get the y(t+1) just at the time we got y(t). The results series will be the same but just one time step earlier.

  • Stepped network

We can even combine both of them.

  • Closed stepped network

We can see the codes for all this transformations, and some examples on the "Save Results" screen (last one) of the Neural Network wizard. Just save the "Advanced Script". There you can also save network, results and extra data into the MATLAB workspace.

We modified the script to turn it into a function so it can be used along the provided example code. This code can do all the tutorial steps automatically for you. Also, code reading can give you some more insight of the process, coding on MATLAB and examples of usage.

Prediction and visualization

Once the training is complete we can show several graphs directly available on the Neural Network Training Tool. For example the Time-Series response:

  • Training
  • Network response

This graph shows the both the predicted and measured temperatures plus a graph with the error (Distance between both curves).

In this step we are going to generate a similar graph manually to show how to generate a prediction using network and process obtained data.

First we prepare the data to be used on the network and use the already trained network to get a prediction. Then we transform the obtained data to matrix format and calculate the difference between the target curve and predicted one. Also, the dates vector has to be shifted same way data has been.

Then we call the function we wrote to plot the data.

Now we can see the three curves on the same graph overplayed. As expected they resempble the same paths as the ones generated by the Neural Network Training Tool.

  • Network prediction
  • Network prediction zoom

Summary

At this point we have learned how to collect data from Carriots platform usig it's REST API, process and analyze that data with MATLAB, train a neural network and use that network to generate and plot new predictions.

Further and deeper analysis could be made on MATLAB, but those are out of this tutorial scope.

Following you can find the full main function: