Open to entry-level Data Engineer roles

Data Engineer  /  Pipelines, Big Data & Cloud

Jaber Mahmoud

I build data pipelines that turn complex, multi-source data into clean, reliable datasets.

Mathematics graduate (Data Science, USJ) who designs ETL pipelines, relational data models, and analysis-ready datasets in Python and SQL to support analytics and machine learning.

Tripoli, Lebanon IBM Data Science certified Python & SQL
About

A data engineer focused on reliable pipelines and clean data.

521,664
Rows of data engineered in a single pipeline
0.96
Best model accuracy achieved
9
End-to-end data projects completed
8
Certifications completed

I am a Mathematics graduate (Data Science option, USJ) specialising in data engineering. My work centres on designing data ingestion and ETL pipelines, building relational data models, and turning large, multi-source datasets into clean, well-structured tables that are ready for analysis.

I work mainly in Python and SQL, with hands-on experience integrating satellite, weather, and transactional data (including a dataset of more than 500,000 rows) and automating multi-stage processing workflows. I hold the IBM Data Science Professional Certificate and am currently preparing for the Google Cloud Cloud Data Engineer Professional Certificate.

I am looking for an entry-level Data Engineer role where well-built pipelines and well-structured data give analytics and machine learning teams a dependable foundation to work from.

What I focus on

  • ETL / ELT pipeline design and automation
  • Relational schema and data modelling
  • Data cleaning, validation and transformation
  • Multi-source ingestion and integration

Languages

ArabicNative
EnglishFluent, Professional
FrenchA1
GermanA1, in progress
Technical Skills

The tools and technologies I work with.

Grouped across the data lifecycle, with data engineering as the core of my work.

Programming & Querying

PythonSQL RC++C#JavaScriptHTML/CSS

Databases & Storage

Relational databasesSQLiteSQL analyticsQuery optimization

Cloud & Big Data

Microsoft AzureGoogle CloudGoogle Earth EngineBig-data frameworks

Data Science & ML

PandasNumPyscikit-learnXGBoostLightGBMTensorFlow/KerasSHAPNLPModel evaluation

BI & Visualization

Power BIDAXTableauStreamlitMatplotlibSeabornPlotlyAdvanced Excel

Tools

Git/GitHubFlaskMicrosoft 365 CopilotPower PlatformGoogle WorkspaceCanva
Selected Work

Featured projects.

A selection of projects covering data ingestion, modelling, and transformation, with analytics, dashboards, and applications built on top.

ETL Pipeline & Analytics

Retail Revenue Intelligence Platform

Built a 6-stage data pipeline on 19,960 orders (4,338 customers, GBP 10.6M revenue): ingestion, cleaning, SQL transformation in SQLite, RFM segmentation, forecasting, and reporting. Modelled 11 RFM segments and surfaced Champions driving GBP 5.7M of GBP 8.9M known-customer revenue, plus an 8-week forecast and a 15-visual Power BI layer.

19,960
Orders processed
11
RFM segments
6-stage
Pipeline
PythonPandasSQLSQLitePower BIDAX
NLP Pipeline & Web App

AI-Generated Text Detection

Built a reproducible NLP data pipeline: curated and balanced a 2,400-essay corpus (human PERSUADE 2.0 vs. GPT, Copilot, DeepSeek, Gemini), then ran preprocessing, TF-IDF feature extraction, POS tagging, and clustering. Benchmarked 5 classifiers including a GRU; Linear SVM reached 0.80 accuracy, 0.83 F1, and 0.99 ROC-AUC, deployed via a Flask app with auth, quotas, and PDF/DOCX/TXT input.

2,400
Essay corpus
0.99
ROC-AUC
5
Models benchmarked
Pythonscikit-learnTensorFlow/KerasspaCyNLTKFlask
Data Mining, Clustering & Prediction

Flight Delay Intelligence

Cleaned and feature-engineered historical flight records (delay-ratio, seasonal and temporal features), then applied K-Means (K=3, elbow + silhouette validated) to surface operational delay profiles, visualised with PCA. Benchmarked regression and classification models for delay prediction and delivered an interactive Streamlit dashboard.

Pythonscikit-learnXGBoostK-MeansPCAStreamlit
Full-Stack App & SQL Data Layer

Elite Kits

Designed a relational SQLite schema and seeded 330 products behind a Flask app with CSRF protection, rate limiting, soft-delete audit logs, and an admin dashboard (CSV exports, coupons, application tracking). Authored 15 SQL analytics queries covering revenue, top sellers, coupon effectiveness, and customer lifetime value.

330
Products seeded
15
SQL analytics queries
FlaskSQLiteJavaScriptHTML/CSS

Additional projects

SpaceX Falcon 9 Launch PredictionIBM Data Science Capstone
LINK vs BTC Financial AnalysisPower BI Global Superstore Sales AnalysisBI & reporting
Melbourne Rainfall PredictionIBM
Certifications

Professional certifications.

Google Cloud / Coursera

Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certificate

In progress

IBM / Coursera

IBM Data Science Professional Certificate

Completed 2025
Google Cloud Fundamentals: Core Infrastructure
Coursera
Google Prompting Essentials
Coursera
Microsoft Azure Cloud Services
Coursera
Tableau
Coursera
Power BI & Power Virtual Agents
Coursera
Microsoft Power Platform
Coursera
Microsoft 365 Copilot
Coursera
Education

Academic background.

Bachelor in Mathematics, option Data Science

Université Saint-Joseph (USJ), North Lebanon

Jan 2023 - May 2026
3.37 / 4.0
Cumulative GPA
4.0 / 4.0
Final semester (S7) GPA

Relevant coursework

Data MiningArtificial IntelligenceNatural Language ProcessingBig Data FrameworksRelational DatabasesData VisualizationStatistical Analysis of DataProbability for Data ScienceData Structures & AlgorithmsAdvanced C++
Leadership & Activities

Leadership and activities.

UN Global Compact, Ambassador of Change

2026 - Present

Advocating the UN Global Compact's Ten Principles and Sustainable Development Goals, and supporting local engagement initiatives.

SDG Brain Lab 5.0, Delegate & Spokesperson

UN Global Compact

Led the Fair Trade Lebanon challenge and co-developed LOCAL+, a certification, QR-transparency, and retail-activation system helping Lebanese products compete with imported brands.

RoadRescue AI x RoadChip, Entrepreneurship Competition

USJ BIAT, May 2026

Developed and pitched a startup concept for the USJ BIAT entrepreneurship competition: a two-product idea pairing an AI roadside-assistance assistant with a Bluetooth vehicle-monitoring device. Prepared the market analysis, business model, and financial projections, and presented the pitch to the judging panel.

Global Mentorship Initiative (GMI), Mentee

Completed Dec 2025

Structured mentorship with an international industry professional on personal branding, interviewing, and global business communication.

Google Developer Student Club (GDSC), Social Media Designer

Nov 2023 - May 2024

Led visual-content strategy and event branding for a campus tech community in USJ North Lebanon.

Contact

Get in touch.

I am open to entry-level Data Engineer roles and data pipeline work. The fastest way to reach me is by email.

Want the full details?

Download my CV for a complete summary of my projects, skills, and experience.

Download CV (PDF)