The RETROFIT project (Efficient Real-Time Processing of Fast Geospatial Data) addresses open questions regarding the real-time filtering and processing of data streams. In doing so, it mostly focuses on geospatial data streams, as a special type of data streams which includes geospatial information (e.g. latitude, longitude). Through its research objectives, the RETROFIT project investigates efficient matching of data objects from geospatial data streams with predefined interests in such data and usage of machine learning techniques in real-time on such data to provide a solution for real-time processing of geospatial data streams. To efficiently deal with the problem of a high frequency of such data streams, the RETROFIT project will focus on distributed processing of data in the computer cluster. The facts about the RETROFIT project:
||Efficient Real-Time Processing of Fast Geospatial Data
|Project contract code
||60 months ( March 1st 2018 – March 1st 2023)
|Croatian Science Foundation contribution
||Associate Professor Krešimir Pripužić, PhD
The RETROFIT project has 5 general objectives and 3 direct research objectives. The general objectives are project management, establishment the research group, set up and maintenance of the laboratory equipment, application to other funding sources and project dissemination and outreach. The direct research objectives are described hereafter:
Objective O4: Development of a real-time matching algorithms for fast geospatial data.
Description: This research objective is to develop a model for geospatial data objects and algorithms for their efficient matching.
- A model of (publications and subscriptions) as geospatial objects ,
- A survey on centralized and distributed algorithms for matching of geospatial objects,
- An efficient implementation of centralized matching algorithms (for geospatial objects),
- An efficient implementation of distributed matching algorithms (for geospatial objects) and
- An experimental evaluation which shows the advantage distributed matching algorithms when compared to their centralized versions.
Impact: Using the model of (publications and subscriptions as) geospatial objects we will formally describe incoming objects within the geospatial data streams and subscriptions as representations of interest for such objects. The developed matching algorithms will be a main building block for the Internet of things platforms which work with geospatial data streams. The experimental evaluation will give an enhanced understanding of the benefits of using a computer cluster for the efficient processing of Fast data.
Objective O5: Application of machine learning techniques to fast geospatial data
Description: This research objective is to identify prominent case studies for applying machine learning techniques to fast geospatial data, analyse them and apply machine learning techniques on them.
- A set of analysed prominent case studies (for applying machine learning techniques to fast geospatial data) and their historical geospatial data,
- A survey on machine learning techniques suitable for the previously identified case studies,
- Centralized and distributed analytical prediction models and their experimental evaluations.
Impact: Using the predicitve methods we will fill missing values in historical geospatial data and predict future values based on the current data. With the distributed solution we will be able to do the same, but more efficiently. The experimental evaluation will give an enhanced understanding of the benefits of using a computer cluster for the application of machine learning techniques on data streams.
Objective O6: Development of distributed solution for real-time processing of fast geospatial data in the cluster
Description: This research objective is to apply the distributed implementation of matching algorithms from objective O4 together with the analytical prediction models from objective O5 to process actual fast geospatial streams from the Internet of Things in real time within the computer cluster.
- Parsers and connectors to deliver publicly available geospatial streams in real-time to the computer cluster and
- A distributed solution for real-time processing of fast geospatial streams within the computer cluster.
Impact: The developed distributed solution will (for several case studies) demonstrate functional Internet of Things applications within the computer cluster which do real-time filtering of incoming data streams from the Internet of Things and apply on them machine learning techniques in real-time.