Talend job to lookup geographic coordinates into a shape file

Introduction

Recently for an open data integration project I had to select some tools in order to be able to process geospatial data. I had a couple of choices: I could use R and try to work out a solution with the packages available on the server or use Talend. One of the biggest restrictions was, the development environment had no internet connection due to security policies and I wanted to try some options iteractively. I decided to give Talend a try and asked the system admins to install the spatial plugin. I only had tried Talend before to accomplish some exercises from the book Talend for Big Data but never used it for a “real-world” project, which was challenging but also made me feel motivated.

Software requirements

Talend open studio for big data

https://www.talend.com/download/

Spatial extension for Talend

https://talend-spatial.github.io/

The experiment

Input data

Customers coordinates: a flat file containing x,y coordinates for every customer.

Municipalities in Austria: a shape file with multi-polygons defining the municipalities areas in Austria: source

Goal

Use the x,y coordinates from the customers to “look-up” the municipality code GKZ in the shape file, which in german stand for “Gemeindekennzahl”. The idea is to determine in which municipality lies every point (customer location).

This is an overview of the overall Talend job

jobOverview
Figure 1. Talend Job Overview

Create a generic schema

crateSchema.jpg
Figure 2. Create a generic schema

Use a sShapeFileInput component

Shapefile Input.JPG
Figure 3. Shape file input

The shapefile contains multipolygons and I want to have polygons. My solution was to use an sSimplify component. I used the default settings. You may need to analyze or find in the source metadata what kind of data is available within the shape file.

The projection of the shapefile was “MGI / Austria Lambert” which corresponds to EPSG 31287. I want to re-project it as EPSG 4326 (GCS_WGS_1984) which is the one used by my input coordinates.

sProj
Figure 4. Re-project the polygons

I read the x, y coordinates from a csv file.

With a s2DPointReplacer I converted the x,y coordinates as Point(x,y) (WKT: well-known text)

PointReplacer
Figure 5. Point replacer

Finally I created an expression in a tMap just to get the polygon and point intersection. The “contains” function would also work:

tmap
Figure 6. Calculate the intersection between the input geometries

Conclusion

Talend did the job and I recommend it as an alternative not only for classical ETL projects but also to create analytical data sets to be consumed by data scientists. Sometimes data cleansing (or data munging/wrangling, or whatever you want to call it) could be cumbersome with scripting languages. With Talend the jobs are easy to understand, could be fully parameterized and reused.

References

3 responses to “Talend job to lookup geographic coordinates into a shape file”

  1. […] Paul Hernandez has a demo where he uses Talend’s product to convert latitude and longitude pai…: […]

  2. What version of Talend did you use?

    1. Hi, sorry for the delayed answer, I was on holidays 😉
      Talend Version: TOS_BD-20161216_1026-V6.3.1
      I hope this helps.
      Best, Paul

Leave a comment