Modal is co-hosting Champagne, Caviar, AI & Nuggs at RAISE Summit with our friends from Poolside, Gladia, turbopuffer and Vercel. Come join us after day two of the conference and put your taste to the test on the luxurious, delicate, beloved chicken nuggets. July 9. Come say hi 👋
Modal
Software Development
New York City, New York 26,342 followers
AI needs a new infrastructure layer. We're building it.
About us
Customers rely on Modal for instant GPU access, sub-second container starts, and native storage, so it's simple to serve low-latency inference, fine-tune models, and access production-ready sandboxes at scale. Every era of computing came with new workloads that previous infrastructure couldn't serve: mainframes, databases, the cloud. Each time, the company that rebuilt the layer underneath defined the decade. AI is no different, except it touches everything instead of one slice. The window to build is open right now.
- Website
-
https://modal.com
External link for Modal
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- New York City, New York
- Type
- Privately Held
- Specialties
- Serverless GPUs, LLM Inference, LLM Fine-Tuning, Generative Model Inference, Generative Model Training, Computational Biology, Audio Generation, Image Generation, Video Generation, Web Scraping, Batch Jobs, Batch Embeddings, Scaling Out, AI Agents, Reinforcement Learning, Sandboxes, and Background Agents
Products
Modal
Platform as a Service (PaaS) Software
Modal is a serverless compute platform that makes it easy for developers to run compute-intensive workloads like ML inference, fine-tuning, and batch jobs. Our proprietary Rust-based container stack is best-in-class, allowing you to run any function in the cloud in less than a second, even on the most in-demand GPU types. We autoscale to thousands of GPUs or CPUs for your functions based on request volume so you can always meet customer demand while never paying for idle resources. Modal's Python SDK allows you to define custom images and hardware requirements in code. No more spending time on config files or cloud consoles. Let your team ship innovative AI products—we'll handle the compute.
Employees at Modal
Locations
-
Primary
Get directions
New York City, New York 10038, US
-
Get directions
Stockholm , SE
-
Get directions
San Francisco, California 94103, US
Updates
-
The most demanding problems in life sciences need more than a capable model, they need infrastructure that scales. Today we're announcing our integration with Claude Science, bringing Modal's elastic compute to researchers when they need it. We're committing up to $100K in compute to support academic life sciences research. Apply by July 15. Read more: https://lnkd.in/gf3X5T26
-
Modal reposted this
We OCR'd 100,000 pages with open-source vision models on Modal in under an hour for about $225. The same job with GPT-5.5 would have cost close to $6,000. The lower-cost proprietary models were cheaper, but they hallucinated enough that it just wasn’t a worthwhile comparison. We wanted to know what it would take to OCR a large document corpus using open-source vision models instead of paying for proprietary APIs. What surprised us the most was that the cheapest GPU per second wasn't the cheapest GPU for the job, in the end. L4s and H100s landed at roughly the same cost per page in some runs, but L4s took 5-6x longer. Timeline should be considered part of the cost. We’ve got a full write-up with methodology, model comparisons, and a side-by-side viewer to judge output quality yourself > https://lnkd.in/gv7VUyT8
-
-
Low-latency inference demands a new serving primitive: Servers. Modal Servers are designed for applications where every millisecond counts, like LLM inference for interactive agents. Servers give you a regionalized, autoscaling pool of HTTP server replicas behind Modal’s routing layer with the deployment ergonomics, fast feedback loops, and autoscaling you know and love. We get into the specifics of how we built this in our new post: https://lnkd.in/gmDFzzGS
-
Modal Auto Endpoints provide state-of-the-art inference performance out of the box. This is because each Endpoint is backed by a low latency inference playbook developed in concert with leading AI companies like Decagon, delivering responses 60ms faster than the best proprietary providers. Learn how: https://lnkd.in/g4Yfdcat
-
-
Modal is co-hosting The Agent Open next week in SF, across the street from AI Engineer World's Fair. Come hear CEO Erik Bernhardsson in conversation with The Pragmatic Engineer Gergely Orosz and founders from Braintrust, LlamaIndex, turbopuffer, and Parallel Web Systems. Stick around after to settle who's the world's best dev pickleballer. June 30. RSVP 👇
-
-
Modal reposted this
We just launched a new product which lets customers set up managed private LLM endpoints in a few clicks (using the UI) or keystrokes (using the CLI). Super excited about this one, in particular because we're taking a slightly different approach than others. We think Modal's big differentiation is in our infrastructure. We built our own file system, container runtime, scheduler, etc. This lets us do some stuff other providers can't do – we can scale up extremely fast (using GPU snapshotting), we can run capacity all over the world for low latency, and many other things. This means we can be super open about the code running inside the containers. When you deploy an endpoint using Modal, you have full access to the code inside. We're running the latest version of vLLM/SGLang rather than a proprietary black box, and you can tweak it yourself. Our approach also lets us work in the open as we improve performance of LLM inference. We are active contributors to projects like vLLM, SGLang, and FA4. We do that because we think everyone benefits from those improvements! However, we do think Modal is the best place to deploy the code, which is why we're launching Auto Endpoints today!
It's time to actually own your inference. The best open source models, optimized out of the box with SOTA speculator models. Deploy today with Modal Auto Endpoints. Learn more: https://lnkd.in/g7Hh6cg6
It's time to actually own your inference
-
It's time to actually own your inference. The best open source models, optimized out of the box with SOTA speculator models. Deploy today with Modal Auto Endpoints. Learn more: https://lnkd.in/g7Hh6cg6
It's time to actually own your inference
-
.wait_until_ready(), set, go Building performant sandbox systems goes way beyond the initial container boot. We're unpacking what that means, and breaking down some tools to help you manage the entire lifecycle. Read more here: https://lnkd.in/dgaethAk
-
-
Modal reposted this
📢 We're partnering with Modal to offer a new development and exhibition opportunity for artists with sustained engagements in artificial intelligence and the arts. This global open call seeks proposals for creative projects that demonstrate the intentional use of AI to further artistic expression. Generative media, enabled by models trained at network-scale and proliferated by agentic systems, has opened an expansive toolkit for artists. As aesthetic practices evolve, how can artists help us interpret the new realities unlocked through scaling computation? Against a prevailing narrative where inference equates to automation, homogenization, and slop, we think there’s another story. One where artists harness the inherent creativity of inference to expand perspectives, create novel forms, and suggest alternative ways of sensing the world. Selected artists will receive: 📌 $2,000 honorarium 📌 Up to $5,000 in materials and production stipends 📌 Up to $20,000 value in Modal credits to use for custom model training and cloud GPU compute. 𝗗𝗘𝗔𝗗𝗟𝗜𝗡𝗘: 𝗝𝘂𝗹𝘆 𝟭𝟱 Full details at https://modal.art/