pp1 (reduced).jpg

Get in touch

<aside> <img src="/icons/mail_gray.svg" alt="/icons/mail_gray.svg" width="40px" /> Email

</aside>

<aside> 🤗 HuggingFace

</aside>

Research interests


Education


<aside> <img src="/icons/arrow-right_gray.svg" alt="/icons/arrow-right_gray.svg" width="40px" /> MPhil in Machine Learning and Machine Intelligence (MLMI), University of Cambridge

</aside>

<aside> <img src="/icons/arrow-right_gray.svg" alt="/icons/arrow-right_gray.svg" width="40px" /> Master of Engineering, CentraleSupélec

</aside>

<aside> <img src="/icons/arrow-right_gray.svg" alt="/icons/arrow-right_gray.svg" width="40px" /> Classe Préparatoire PTSI/PT*, Lycée Jean-Baptiste Say

</aside>

About Me


I'm a DataScientist/ML researcher. I grew up in a small town near Disneyland Paris 🇫🇷 (pretty cool eh?), and I'm lucky to have traveled around the world during my academic years (love you 🇧🇷🇭🇰🇬🇧).

<aside> <img src="https://cdn4.iconfinder.com/data/icons/ionicons/512/icon-social-github-512.png" alt="https://cdn4.iconfinder.com/data/icons/ionicons/512/icon-social-github-512.png" width="40px" /> GitHub

</aside>

<aside> <img src="https://i.pinimg.com/736x/92/d1/db/92d1db1521d374335498624cc95e9554.jpg" alt="https://i.pinimg.com/736x/92/d1/db/92d1db1521d374335498624cc95e9554.jpg" width="40px" /> LinkedIn

</aside>

<aside> <img src="https://cdn2.iconfinder.com/data/icons/threads-by-instagram/24/x-logo-twitter-new-brand-contained-1024.png" alt="https://cdn2.iconfinder.com/data/icons/threads-by-instagram/24/x-logo-twitter-new-brand-contained-1024.png" width="40px" /> Twitter (X)

</aside>

<aside> <img src="/icons/graduate_gray.svg" alt="/icons/graduate_gray.svg" width="40px" /> Scholar

</aside>

Work Experience


<aside> <img src="/icons/laptop_gray.svg" alt="/icons/laptop_gray.svg" width="40px" /> Experienced Data Scientist ILLUIN Technology, Puteaux (France) [October 2023 - Present]

• Carried document parsing projects (dev + communication) for French clients. • Maintained nAIxt (ILLUIN's AI orchestrator), and implemented SOTA models for document layout detection and document understanding. • Wrote the ColPali paper (co-first author) for efficient document retrieval.

AI tasks: Document Layout Recognition, Document Parsing, Fine-tuning with LoRA, Large Language Models (LLMs), Optical Character Recognition (OCR), Retrieval Augmented Generation (RAG), Vision Language Models (VLMs)

Technologies: Angular, Docker, Document Layout Recognition, DVC, GitHub Actions, GitLab Pipelines, Google Cloud Compute (VMs), Google Cloud Storage, Helm, HuggingFace Dataset, HuggingFace Transformers, JavaScript, Kubernetes, minIO, Postman, PyPI, PyTorch, RxJS, Temporal, TypeScript

</aside>

<aside> <img src="/icons/laptop_gray.svg" alt="/icons/laptop_gray.svg" width="40px" /> Data Scientist Intern Veolia Asia, Hong Kong SAR (Hong Kong) [March 2022 - September 2022]

• Developed a TimeSeries forecasting model in Tensorflow to optimize chemical waste treatment facilities. • Analyzed water consumption data in India and developed a fraud detection model with LightGBM. • Created interactive dashboards and custom ML models for industrial process optimization.

AI tasks: Data Visualization, Fraud Detection, TimeSeries Forecasting

Technologies: lightgbm, Mermaid, plotly, sklearn, TensorFlow, Vertex AI, xgboost

</aside>

<aside> <img src="/icons/laptop_gray.svg" alt="/icons/laptop_gray.svg" width="40px" /> Data Scientist Intern TinyClues (acquired by Splio), Paris (France) [June 2021 - December 2021] • Created Periscope dashboards to monitor deployed ML models. • Developed a Hyperparameter Tuning (HPT) tool for recommendation systems using GCP, Kubeflow, and Optuna. • Migration of the data pipeline (ingestion, cleaning, preprocessing, HPT, deployment) with dbt, TensorFlow, AWS, BigQuerry (GCP) for 5 clients. Achieved +5% of Future Gini on average. • Created a data discovery and a simulated concentration tool for the DataOps team. • Developed an explainable AI tool for the company's RecSys model.

AI tasks: Data Visualization, Explainable AI (XAI), Recommendation Systems (RecSys)

Technologies: BigQuery, dbt, Kubeflow, GitHub, Optuna, Periscope, shap, SQL, Tensorflow

</aside>

My favorite projects


Portfolio Items

Miscellaneous

🙋🏻 Who you should follow too!


https://www.youtube.com/@3blue1brown

https://www.youtube.com/@Fireship

https://huyenchip.com/

https://lilianweng.github.io/

https://x.com/_akhaliq

https://x.com/_philschmid

https://x.com/mervenoyann

📚 Technical books: my starter kit


👀 Non-AI stuff