Josh Le Grice
Josh Le Grice

MSc Data Science · University of Exeter

Josh Le Grice

Data Engineer and Software Engineer specialzing in scalable financial systems, Python, and C++.

13
Projects
4
Featured

Technical Skills

Expert
Proficient
Familiar

Core Languages

Python SQL C++ R JavaScript Bash HTML LaTeX

Machine Learning & Deep Learning

Scikit-learn PyTorch LLMs Transformers (HF) RAG Optuna

Quantitative Finance

Time Series Analysis Equity Forecasting Algorithmic Trading Portfolio Optimisation Securities & Derivatives Order Book Mechanics

Software Engineering

Git Docker React Node.js FastAPI Streamlit CI/CD GitHub Actions

Data & Cloud Infrastructure

Pandas Apache Spark ETL Pipelines Azure Databricks

Featured Projects

Limit Order Book
High-Performance Limit Order Book in C++

A matching engine built in C++20 replicating core exchange infrastructure. Implements price-time priority matching, O(1) order cancellation via hash-indexed lookup, and fixed-point integer price representation to eliminate floating-point map key errors. Includes VWAP tracking, nanosecond-resolution latency measurement, and test suite covering limit orders, market orders, partial fills, and cancellation.

C++20 Matching Engine
GitHub →
Comparing LLM Methods
Comparing LLM Methods: Parameter-Efficient Fine-tuning and Retrieval-Augmented Generation

Advanced LLM applications coursework comparing full fine-tuning vs. LoRA on DistilBERT, and implementing RAG for question answering. Achieved 89.8% accuracy with full fine-tuning and 85.7% with LoRA (49% faster, 99% fewer parameters). RAG system improved answer quality by 8.5× using retrieved context.

PyTorchTransformers PEFTLoRA RAGDistilBERT
GitHub →
Credit Risk
Predicting Loan Defaults: Credit Risk Analysis

End-to-end credit risk pipeline on real-world loan data. Compared Logistic Regression, Random Forest, and XGBoost; addressed multicollinearity and class imbalance. XGBoost achieved the best results with 92.8% Accuracy, 0.944 AUC, and 0.816 F1-score.

PythonXGBoostScikit-learn
MLP from scratch
Multi-Layer Perceptron for Image Classification

Implemented a full MLP in NumPy - forward pass, backpropagation, weight updates - achieving ~89% accuracy on Fashion MNIST, on par with a reference PyTorch implementation.

PythonNumPyDeep Learning
GitHub →
Diabetes prediction
Predicting Diabetes: Key Factors & Model Benchmarking

Identified top clinical predictors of diabetes diagnosis and benchmarked classification models. Logistic Regression yielded the highest accuracy at 77.3%, while SVM achieved the highest AUC at 0.850. Includes feature importance analysis and model interpretability.

PythonMLResearch
KPI Dashboard
Commercial KPI Dashboard - Ekimetrics Internship

Semi-automated reporting pipeline built during internship at Ekimetrics: raw data ingestion and cleaning through to interactive Power BI dashboards for commercial KPI tracking.

Power BIPythonData Pipeline
Proprietary - not publicly accessible
Dissertation
ML & Type 1 Diabetes: A Critical Analysis

Third-year dissertation evaluating machine learning applications in T1D: glucose forecasting models, diagnostic tools, and closed-loop insulin delivery systems. Highlights include XGBoost achieving 0.99 AUC for hypoglycaemia prediction and dual-hormone systems reaching 93.1% Time in Range (TIR).

ResearchML
Read report →
Ethics Report
Ethical Considerations in Automated Insulin Delivery

Evaluated the Diabeloop DBLG1 AI-driven diabetes management system. Analyzed algorithmic bias, transparency, and fairness through utilitarian and principlism frameworks, proposing collaborative governance strategies.

ResearchAI Ethics
Read report →
Guild ETL Pipeline
Exeter Students' Guild – ETL Data Pipeline

Streamlit web app for processing student data exports using a Bronze → Silver → Gold medallion architecture. Cleans and validates records, then splits into four exports for Power BI, membership, and survey platforms. Hosted on Azure App Service.

Python Streamlit Azure Pandas ETL Power BI
Proprietary - not publicly accessible

Certifications

IBM Data Science Certificate
IBM Data Science Professional Certificate

12-module certification covering Python, SQL, data visualisation, machine learning, and applied data science. Verified by IBM via Coursera.

View certificate →

Contact

Email is the fastest way to reach me.