Amazon is a ‘search’ platform. 50-70% of shoppers across categories are searchers, not browsers. Unlike ‘browse’ heavy platforms like Nykaa, Myntra, Cred and others, journeys start and end with a search or two. Being visible on searches is the game. The problem is that all top listings are advertisements you need to bid for. This performance marketing is addictive because one, it gives quick returns and two, reducing spends has direct impact on revenue. But it is expensive if not done efficiently or if done in vanity. Thousands of brands have tried to gain traction through AMS only and ended up in the burial ground. It’s a death spiral. The only way one can survive selling on Amazon is if a significant portion of sales comes organically. And for that one needs to rank higher organically. Amazon uses the A10 algorithm to rank products according to relevance to search. It’s an almost black box but some factors it seems to assign weights to are: 1. Search relevance: it checks keywords in the front-end, back-end, descriptions and rest of listing including richness of A+ content. 2. Consistency of sales velocity: OOS affects it badly. Fluctuations affect it badly. Grow steady and fast, preferably steady. 3. External signals: ratings, reviews and external traffic’s weight has been increased in A10 compared to A9. Not much else matters if your ratings are poor. Ratings affect the factors that follow next. A double whammy! 4. Click through rates: What % of people who saw your listing clicked on it. A function of first listing card and delivery time among others. 5. Conversion rates: What % of people who saw your listing went on to buy. 6. Seller Authority: your karma matters. Keep on doing the right things and the system rewards. Fall in the trap of a quick buck and you back a couple of steps.
Performance Optimization Techniques
Explore top LinkedIn content from expert professionals.
-
-
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling. And finally, we also load openly available pretrained weights into our scratch-built model architecture. Along with this pretraining tutorial, I also have bonus material on speeding up the LLM training. These apply not just to LLMs but also to other transformer-based models like vision transformers: 1. Instead of saving the causal mask, this creates the causal mask on the fly to reduce memory usage (here it has minimal effect, but it can add up in long-context size models like Llama 3.2 with 131k-input-tokens support) 2. Use tensor cores (only works for Ampere GPUs like A100 and newer) 3. Use the fused CUDA kernels for `AdamW` by setting 4. Pre-allocate and re-use GPU memory via the pinned memory setting in the data loader 5. Switch from 32-bit float to 16-bit brain float (bfloat16) precision 6. Replace from-scratch implementations of attention mechanisms, layer normalizations, and activation functions with PyTorch counterparts that have optimized CUDA kernels 7. Use FlashAttention for more efficient memory read and write operations 8. Compile the model 9. Optimize the vocabulary size 10. After saving memory with the steps above, increase the batch size Video tutorial: https://lnkd.in/gDRycWea PyTorch speed-ups: https://lnkd.in/gChvGCJH
-
💥 Data Engineer Interview Killer: Handling 500GB Daily with PySpark Data pros — have you ever been asked this in an interview? 👉 “How would you efficiently process a 500 GB dataset in PySpark, and how would you size your cluster?” It’s one of my favorite questions — because it blends architecture, optimization, and cost awareness into one real-world scenario. Here’s how I’d break it down 👇 💡 The 5-Step Optimization Blueprint 1️⃣ Format First — The Foundation of Speed 🚀 Action: Convert raw data (CSV/JSON) into Parquet or Delta Lake right away. Why: Columnar storage, compression, and predicate pushdown drastically cut I/O. 👉 This single step often gives the biggest performance boost. 2️⃣ Partitioning Math — Define Your Parallelism 🧮 Each Spark task should process around 128 MB. Calculation: 500 GB × 1024 MB/GB ÷ 128 MB/partition ≈ 4,000 partitions ➡️ Spark now has ~4,000 tasks to parallelize — perfect for scaling efficiently. 3️⃣ Cluster Sizing — Predictable Execution 🧠 Let’s assume: 10 worker nodes 8 cores & 32 GB RAM per node Parallelism: 10 nodes × 8 cores = 80 cores total Each core handles ~2–3 tasks → ~240 tasks concurrently Total time: 4,000 ÷ 240 ≈ 17 waves of execution At ~1–2 min per wave → ~25–30 minutes total runtime That’s how you explain both scaling and efficiency in an interview. 4️⃣ Memory Management — Avoid the Spill 💾 Plan for roughly 3× data size during joins and shuffles. Estimate: (500 GB × 3) ÷ 10 nodes = 150 GB per node With only 32 GB per node, Spark will spill to disk — which is fine if SSD-backed. For critical workloads, upgrade to 64 GB nodes to keep processing smooth. 5️⃣ Performance Tweaks — Fine-Tuning ⚙️ spark.sql.shuffle.partitions = 400 spark.sql.adaptive.enabled = True ✅ Use Broadcast Joins for small lookup tables. ✅ Implement Incremental Loads (Delta Lake makes this easy). ✅ Avoid full reloads — only process what’s changed. 🧭 The Real Data Engineering Challenge Optimizing Spark isn’t about adding more compute — it’s about finding the sweet spot between performance, cost, and scalability. 🔥 Question for you: If you got this same question in an interview — how would you size your cluster or optimize it differently? 👇 I’ll be sharing my cost–benefit breakdown in the next post — how to choose between scaling up vs scaling out for real workloads. #PySpark #ApacheSpark #Databricks #BigData #DataEngineering #Optimization #InterviewPrep #Azure
-
Few Lessons from Deploying and Using LLMs in Production Deploying LLMs can feel like hiring a hyperactive genius intern—they dazzle users while potentially draining your API budget. Here are some insights I’ve gathered: 1. “Cheap” is a Lie You Tell Yourself: Cloud costs per call may seem low, but the overall expense of an LLM-based system can skyrocket. Fixes: - Cache repetitive queries: Users ask the same thing at least 100x/day - Gatekeep: Use cheap classifiers (BERT) to filter “easy” requests. Let LLMs handle only the complex 10% and your current systems handle the remaining 90%. - Quantize your models: Shrink LLMs to run on cheaper hardware without massive accuracy drops - Asynchronously build your caches — Pre-generate common responses before they’re requested or gracefully fail the first time a query comes and cache for the next time. 2. Guard Against Model Hallucinations: Sometimes, models express answers with such confidence that distinguishing fact from fiction becomes challenging, even for human reviewers. Fixes: - Use RAG - Just a fancy way of saying to provide your model the knowledge it requires in the prompt itself by querying some database based on semantic matches with the query. - Guardrails: Validate outputs using regex or cross-encoders to establish a clear decision boundary between the query and the LLM’s response. 3. The best LLM is often a discriminative model: You don’t always need a full LLM. Consider knowledge distillation: use a large LLM to label your data and then train a smaller, discriminative model that performs similarly at a much lower cost. 4. It's not about the model, it is about the data on which it is trained: A smaller LLM might struggle with specialized domain data—that’s normal. Fine-tune your model on your specific data set by starting with parameter-efficient methods (like LoRA or Adapters) and using synthetic data generation to bootstrap training. 5. Prompts are the new Features: Prompts are the new features in your system. Version them, run A/B tests, and continuously refine using online experiments. Consider bandit algorithms to automatically promote the best-performing variants. What do you think? Have I missed anything? I’d love to hear your “I survived LLM prod” stories in the comments!
-
In the last three months alone, over ten papers outlining novel prompting techniques were published, boosting LLMs’ performance by a substantial margin. Two weeks ago, a groundbreaking paper from Microsoft demonstrated how a well-prompted GPT-4 outperforms Google’s Med-PaLM 2, a specialized medical model, solely through sophisticated prompting techniques. Yet, while our X and LinkedIn feeds buzz with ‘secret prompting tips’, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results. https://lnkd.in/g7_6eP6y In this AI Tidbits Deep Dive, I outline six of the best and recent prompting methods: (1) EmotionPrompt - inspired by human psychology, this method utilizes emotional stimuli in prompts to gain performance enhancements (2) Optimization by PROmpting (OPRO) - a DeepMind innovation that refines prompts automatically, surpassing human-crafted ones. This paper discovered the “Take a deep breath” instruction that improved LLMs’ performance by 9%. (3) Chain-of-Verification (CoVe) - Meta's novel four-step prompting process that drastically reduces hallucinations and improves factual accuracy (4) System 2 Attention (S2A) - also from Meta, a prompting method that filters out irrelevant details prior to querying the LLM (5) Step-Back Prompting - encouraging LLMs to abstract queries for enhanced reasoning (6) Rephrase and Respond (RaR) - UCLA's method that lets LLMs rephrase queries for better comprehension and response accuracy Understanding the spectrum of available prompting strategies and how to apply them in your app can mean the difference between a production-ready app and a nascent project with untapped potential. Full blog post https://lnkd.in/g7_6eP6y
-
In 2008, Michael Phelps won Olympic GOLD - completely blind. The moment he dove in, his goggles filled with water. But he kept swimming. Most swimmers would’ve fallen apart. Phelps didn’t - because he had trained for chaos, hundreds of times. His coach, Bob Bowman, would break his goggles, remove clocks, exhaust him deliberately. Why? Because when you train under stress, performance becomes instinct. Psychologists call this stress inoculation. When you expose yourself to small, manageable stress: - Your amygdala (fear centre) becomes less reactive. - Your prefrontal cortex (logic centre) stays calmer under pressure. Phelps had rehearsed swimming blind so often that it felt normal. He knew the stroke count. He hit the wall without seeing it. And won GOLD by 0.01 seconds. The same science is why: - Navy SEALs tie their hands and practice underwater survival. - Astronauts simulate system failures in zero gravity. - Emergency responders train inside burning buildings. And you can build it too. Here’s how: ✅ Expose yourself to small discomforts. Take cold showers. Wake up 30 minutes earlier. Speak up in meetings. The goal is to build confidence that you can handle hard things. ✅ Use quick stress resets. Try cyclic sighing: Inhale deeply through your nose. Take a second small inhale. Exhale slowly through your mouth. Repeat 3-5 times to calm your system fast. ✅ Strengthen emotional endurance. Instead of avoiding difficult conversations, hard tasks, or feedback - lean into them. Facing small emotional challenges trains you for bigger ones later. ✅ Celebrate small victories. Every time you stay calm, adapt, or keep going under pressure - recognise it. These tiny wins are building your mental "muscle memory" for resilience. As a new parent, I know my son Krish will face his own "goggles-filled-with-water" moments someday. So the best I can do is model resilience myself. Because resilience isn’t gifted - it’s trained. And when you train your brain for chaos, you can survive anything. So I hope you do the same. If this made you pause, feel free to repost and share the thought. #healthandwellness #mentalhealth #stress
-
LLMs are no longer just fancy autocomplete engines. We’re seeing a clear shift—from single-shot prompting to techniques that mimic 𝗮𝗴𝗲𝗻𝗰𝘆: reasoning, retrieving, taking action, and even coordinating across steps. In this visual, I’ve laid out five core prompting strategies: - 𝗥𝗔𝗚 – Brings in external knowledge, enhancing factual accuracy - 𝗥𝗲𝗔𝗰𝘁 – Enables reasoning 𝗮𝗻𝗱 acting, the essence of agentic behavior - 𝗗𝗦𝗣 – Adds directional hints through policy models - 𝗧𝗼𝗧 (𝗧𝗿𝗲𝗲-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁) – Simulates branching reasoning paths, like a mini debate inside the LLM - 𝗖𝗼𝗧 (𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁) – Breaks down complex thinking into step-by-step logic While not all of these are fully agentic on their own, techniques like 𝗥𝗲𝗔𝗰𝘁 and 𝗧𝗼𝗧 are clear stepping stones to 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 — where autonomous agents can 𝗿𝗲𝗮𝘀𝗼𝗻, 𝗽𝗹𝗮𝗻, 𝗮𝗻𝗱 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁 𝘄𝗶𝘁𝗵 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀. The big picture? We’re slowly moving from "𝘱𝘳𝘰𝘮𝘱𝘵 𝘦𝘯𝘨𝘪𝘯𝘦𝘦𝘳𝘪𝘯𝘨" to "𝘤𝘰𝘨𝘯𝘪𝘵𝘪𝘷𝘦 𝘢𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦 𝘥𝘦𝘴𝘪𝘨𝘯." And that’s where the real innovation lies.
-
🚀 Fixed a Frontend Latency Issue Today (With Real Numbers) Today I worked on a frontend module that was feeling slow and unresponsive. Users were experiencing delays of 3.2 seconds before the page became interactive. 🔍 What I Found (Metrics) After profiling: • JavaScript bundle size was 1.1 MB • Main-thread blocked for 1,850 ms • 12 API calls were firing instantly on page load • A heavy calculation took 420 ms on the UI thread 🛠 How I Fixed It (With Improvements) Here’s what I implemented: 1️⃣ Code Splitting • Reduced initial bundle size from 1.1 MB → 420 KB • That’s a 62% reduction 2️⃣ Lazy Loading • Deferred 6 non-critical components • Reduced first paint time by 700 ms 3️⃣ Web Workers • Moved a 420 ms calculation off the UI thread • Result: 0 ms UI blocking 4️⃣ API Debouncing • Cut 12 API calls down to 4 meaningful calls • Saved ~300 ms in network overhead 5️⃣ Preloading Critical Assets • Reduced Time to Interactive from 3.2s → 1.1s ⚡ Final Impact • Page became interactive 2.9x faster • UI responsiveness increased by 45% • Main-thread blocking dropped from 1850ms → 420ms • Overall performance score improved from 56 → 87 (Lighthouse)