byPCCW Global

Date / Time: Friday 6 November / 18:30-21:30 EET


Speaker: Costas Yiotis


This workshop will provide insights and best practices on Data Engineering. The aim is to enable anyone to build robust and scalable data pipelines to process enterprise network traffic data and support business analytics, while diving into big data engineering concepts such as Data ingestion, Message Queuing, Batch vs Stream Processing & Analysis and Data Visualization. Attendees will acquire hands-on experience analyzing real network traffic in an horizontally scalable manner, using frameworks such as Apache Spark, Apache Kafka and Logstash. Finally, the workshop will touch on domain-specific peculiarities that might influence decisions in a data pipeline design, such as maintaining and updating reference data used for enrichment among others.

Beginner / Intermediate

Target audience:
Students, Data Engineers, Big Data Analysts, Network Engineers

Prerequisites on Audience:
a) SW/HW:
Bring a laptop (preferably with at least 8GB of RAM) and Docker already installed.

b) Know-how: 
No prior knowledge in Data engineering frameworks is necessary. However a basic understanding of Python programming language, Docker and Computer Networks will help.

    i)   Presentation on Data engineering for network monitoring,
    ii)  Docker compose containing the whole testbed infrastructure,
    iii) Notebook with all code generated during the workshop

1.5h presentation, 1.5h hands-on