Skip to content
View aidendorian's full-sized avatar

Block or report aidendorian

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aidendorian/README.md

Kartikey Joshi

B.Tech CSE Specializing in AI & ML, graduating 2027. I'm interested in the more mathematical and low-level side of deep learning — building models from scratch, understanding what's inside them, and working on problems where physics and neural networks intersect.

Currently learning CUDA and working on 3D Gaussian Splatting.


Projects

Marcella ★ 2
A ~60M parameter decoder-only transformer built entirely from scratch in PyTorch — no Hugging Face, no shortcuts. Implements RoPE, RMSNorm, SwiGLU FFN, Flash SDP attention with custom causal masking, and a per-layer KV cache with pre-allocated fixed-size tensors for zero-overhead inference. Trained on a weighted mix of FineWeb-Edu, Wikipedia, and SlimPajama with a custom SentencePiece tokenizer (32K vocab). Instruction-finetuned with response-only loss masking. Evaluated at perplexity 32.87 on a held-out split. Ships with a FastAPI streaming backend and a Svelte chat UI.

FWI ★ 13
Physics-Informed GAN for Elastic Full Waveform Inversion — reconstructing subsurface Earth properties (Vp, Vs, density, Poisson's ratio, Young's modulus) from multi-component seismic waveforms. The generator is a U-Net that maps waveform inputs [B, 10, 1000, 70] to 70×70 subsurface grids; a Fourier Neural Operator acts as the differentiable elastic wave solver; a WGAN discriminator enforces realism. Total loss combines adversarial, data misfit (MSE), and PDE residual terms. Uses the ECFB dataset from the SMILE team. (In progress)

Vision-Transformer-for-DeepFake-Detection ★ 1
ViT-based deepfake detector with self-supervised pretraining via masked image modeling on CelebA, finetuned on DFDC. Achieves ~85% accuracy and ROC-AUC ~0.93. Includes Grad-CAM to validate that detections focus on manipulated facial regions rather than background artifacts.

4x-Upscaler-ESRGAN
ESRGAN for 4× image super-resolution. Two-phase training: PSNR-optimised first, then adversarial + VGG perceptual loss. RRDBNet with 23 RRDB blocks. Tile-based inference reduces peak GPU memory from ~4.5 GB to ~500 MB (≈89% reduction) without degrading output quality.

NeuralStyleTransfer
Neural style transfer using VGG19 with multi-scale pyramid optimisation and L-BFGS refinement, balancing content, style, and total variation loss.


Currently

  • Learning CUDA — kernels, memory hierarchies, warp-level operations
  • Exploring 3D Gaussian Splatting for real-time radiance field rendering

Stack

Python · PyTorch · CUDA · C++ · FastAPI

Computer Vision · Language Modelling · Physics-Informed Neural Networks · Fourier Neural Operators · GANs · Self-Supervised Learning · Flash Attention · KV Cache · SentencePiece

Pinned Loading

  1. FWI FWI Public

    Elastic Full Waveform Inversion using GAN and FNO as Wave Solver

    Python 14 4

  2. Vision-Transformer-for-DeepFake-Detection Vision-Transformer-for-DeepFake-Detection Public

    Implementation of SSL ViT to Detect DeepFakes, pretrained on CelebA, finetuned on DFDC's train split.

    Python 1 1

  3. NeuralStyleTransfer NeuralStyleTransfer Public

    Neural Style Transfer with PyTorch: Multi-scale pyramid optimization and L-BFGS refinement for high-quality artistic image generation using VGG19.

    Python

  4. 4x-Upscaler-ESRGAN 4x-Upscaler-ESRGAN Public

    PyTorch implementation of ESRGAN for 4× image super-resolution. Features two-phase training (PSNR + GAN), RRDBNet architecture with 23 RRDB blocks, and comprehensive monitoring guides. Achieves 25.…

    Jupyter Notebook

  5. Marcella-66M-SLM Marcella-66M-SLM Public

    A 66M parameter decoder-only transformer language model implemented from scratch in PyTorch. Features a custom SentencePiece tokenizer, RoPE positional embeddings, SwiGLU feed-forward network, per-…

    Python 2