Intelligence, everywhere.
8080 is the first cloud built from the ground up for AI-native applications. Our purpose is simple: make inference so fast and cheap that developers build AI into everything. Every feature. Every function.
The cornerstone of our infrastructure are Taalas’ next-generation inference ASICs. These chips run inference 100x faster, and 2000x+ more efficiently. 8080 is the first cloud to deploy Taalas chips, and further, we are the first cloud built from the ground up around this transformative technology.
The modern AI agent pipeline doesn’t just use inference. It needs VMs, code sandboxes, audio models, embeddings models, storage, search, retrieval. We’ve colocated our inference ASICs with CPUs, GPUs on the same servers, in the same racks, and we’ve deployed these racks 10ms from every hypserscaler cloud region. By keeping every stage of a pipeline in-rack and just milliseconds from your core app servers or your end-users, we unlock applications that simply couldn’t exist before.
We are fundamental believers in the law of demand: the cheaper inference is, the more it will be used. We’ve structured our infrastructure and company to keep costs as low as possible. So go crazy.
8080 is simple. We have two products: a high performance inference API, and a high performance platform SDK.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│┌────────┐│ │┌────────┐│ │┌────────┐│
││ ASIC ││ ││ ASIC ││ ││ ASIC ││
│└────────┘│ │└────────┘│ │└────────┘│
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘
us-east us-central us-west
Direct access to inference ASICs through a simple API. A lightweight router lands requests on the closest healthy pool, then returns results without detours. You get high tokens per second and stable tail latency under real load
┌──────────┐ ┌──────────┐ ┌──────────┐
│┌────────┐│ │┌────────┐│ │┌────────┐│
││ ASIC ││ ││ ASIC ││ ││ ASIC ││
│└────────┘│ │└────────┘│ │└────────┘│
│┌────────┐│ │┌────────┐│ │┌────────┐│
││ vGPU ││ ││ vGPU ││ ││ vGPU ││
│└────────┘│ │└────────┘│ │└────────┘│
│┌────────┐│ │┌────────┐│ │┌────────┐│
││ vCPU ││ ││ vCPU ││ ││ vCPU ││
│└────────┘│ │└────────┘│ │└────────┘│
│┌────────┐│ │┌────────┐│ │┌────────┐│
││ Disk ││ ││ Disk ││ ││ Disk ││
│└────────┘│ │└────────┘│ │└────────┘│
└──────────┘ └──────────┘ └──────────┘
us-east us-central us-west
An execution environment purpose-built for AI. CPU, GPUs, and storage sit in-rack with our inference ASICs, keeping your agent’s tools inches from compute. Retrieval, function calls, and pre/post-processing happen on sub-millisecond hops, enabling end-to-end intelligence without bottlenecks.
Accessible through the 8080 SDK, developers get direct control of this environment—colocated hardware, unmatched inference speed, and the ability to deploy application patterns impossible elsewhere. From batch-of-N inference to dynamic prompt routing to complete full-stack AI systems, the SDK makes it simple to build and scale next-generation AI features.
Our hardware is strategically deployed around the country, within milliseconds of your app servers at every major hyperscaler region. Logging, tool usage, structured outputs, fine-tune management, key management, code sandboxes, model routing, and much more are all supported out of the box.
Metric | Est. |
---|---|
Input Tokens Per Second | 200,000 |
Output Tokens Per Second | 20,000 |
Time to First Token (Metal) | 50 µs |
Time to First Token (Cloud) | 20 ms |
Product | Price |
---|---|
Inference – Llama 3.1 7B | $0.01 per 1m tokens, flat |
Inference – OpenAI OSS 20B | $0.05 per 1m tokens, flat |
Platform – vCPU | $0.000020 per vCPU-second |
Platform – GPU (RTX Pro 6000) | $0.00014 per GPU-second |
Platform – Memory | $0.0000025 per GiB-second |
Platform – Disk | $0.00000007 per GB-second |
8080 exists to inspire new software. We can’t tell you what to build, but here are a few patterns to spark ideas.
Put AI everywhere. Wire translation, classification, extraction, redaction, dedupe, and summarization into every workflow and never babysit them again. On 8080 these schema bound microloops hit tight p99 targets at low cost, so you can saturate your product with intelligence. Ship hundreds of set and forget calls that lift quality and compound across every user action.
Build ultra realtime apps. Run your agents on racks where inference ASICs, CPUs, GPUs, and NVMe storage sit on the same top of rack fabric at the edge. Keep the loop local and the hops few so cold starts shrink and first token lands fast. Launch dynamic ads, live game agents, and context critical copilots that react in milliseconds.
Push model intelligence by turning test time into compute you control. Pair very fast models on inference ASICs with adjacent GPUs to run pass@1024 style exploration, verification, and selection inside your latency budget. Spin up on the fly synthetic data to harden prompts and policies. Build features other APIs cannot support at speed.
As we build out capacity, we’ll start bringing on select customers in the fourth quarter of 2025. We’ll be adding more detail here as we get closer to launch. Until then, you can add your email here to stay in touch, and follow us on X.
The Altair 8800, introduced in 1975, is widely regarded as the spark that ignited the personal computer revolution. For the first time, individuals could own and program a real computer outside of corporate or institutional settings, collapsing the cost and physical scale barriers that had kept computing in the hands of a privileged few. Its debut inspired Bill Gates and Paul Allen to write Microsoft’s first product, and countless others to build the future. And the processor powering the 8800? The Intel 8080.
┌──────────────────────────────────────────────────────────────────────────┐
│ INTE PROT MEMR INP MI OUT HLTA STK WO INT │
│ o o o o o o o o o o │
│ D7 D6 D5 D4 D3 D2 D1 D0 │
│ o o o o o o o o │
│ A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0 │
│ o o o o o o o o o o o o o o o o │
│ | | | | | | | | | | | | | | | | │
│ | | | | | | | | | | | | | | | | │
│ 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 │
│ [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] │
│ │
│ OFF STOP STEP EXAM DEPOSIT RESET PROTECT AUX AUX │
│ [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] │
│ ON RUN EXAMNXT DEPNXT CLEAR UNPROTECT │
│ │
│ MITS ALTAIR 8800 COMPUTER │
└──────────────────────────────────────────────────────────────────────────┘
Technology is the lever with which humanity moves the world, and AI might be our biggest lever yet. It will change everything.
But there’s one huge challenge facing the next Bezos, Page, Andreessen, or Zuckerberg of the AI era: intelligence is still too slow, too expensive, and too fragmented. Modern AI chains are multi-stage, multi-pass, and multi-modal, bouncing data across CPUs, GPUs, and network. Latency and cost — not model quality — now set the ceiling on what builders can ship.
We envision a world where intelligence is everywhere, incorporated into every object, making human life easier and better every second of every day. Intelligence that is too cheap to meter and too fast to notice.
The Googles and Amazons of the AI era haven’t been built yet. They will be built on 8080, and because of 8080. We’ll be the shoulders upon which they stand.
We talk of companies being “AI-native,” but few have really scratched the surface. AI can do so much more. When model inference and the infrastructure surrounding it are fast and cheap enough, intelligence can be present in every layer of the stack and even in every function.
We want to support the most ambitious and perhaps crazy applications that will redefine “AI-native.” Somewhere, a dev is dreaming about adding AI into every frontend page render like some sort of intelligent CDN, another is dreaming about building a new Salesforce or Amazon that uses AI to do everything from search to generation to database operations, and a third is dreaming of something completely new that will change everything.
We are not building a normal company. We’ve done that before, and believe that in the AI era there must be a better way. We are designing 8080 to accomplish our mission and are capping headcount at 10 until we reach $100M in revenue. That might not be possible, but we are going to try. In order to do that:
We don’t hire employees, we hire partners. As partners, we are self-directed, self-managed, have equal control over the company’s strategic direction, and total control over our specific areas of focus. Partners solve problems without being asked.
We hire seldomly, carefully, with unanimous consent, and only people who are proven top performers and great colleagues (with references to match). No managers. Just builders. A company of ten 10X builders, with 10X automation, should be as productive as 1,000 people.
Human time is precious, so we don’t waste a minute on anything not worthwhile. We use AI – built on our own platform – to automate everything we possibly can to save us and our customers’ time. Meetings are only held when vitally important because any minute wasted is a tragedy, whether it’s for our users or partners. We are default remote, but gather in person when it’s worth the time.
As partners, everyone is compensated highly and equally. As the company succeeds, partners will be able to participate in that success in various ways, most notably via profit sharing. We demand excellence and compensate accordingly.
If you’re interested in joining us, send us an email at join[at]8080[dot]io.
We’re looking for partners who want to build for building’s sake. Every partner is a full-stack builder, first, but also has, either by experience or passion, expertise that augments the rest of the team. We are looking for partners with expertise in the following areas:
North Star: server utilization and latency. Constructing state-of-the-art LLM inference infrastructure from scratch that handles millions of requests per second, maximizes hardware utilization, and intelligently routes each request to the optimal edge PoP for the lowest possible latency. This includes designing and implementing the global routing engine that decides—in microseconds—where every request should execute. Leveraging expertise in high-performance, concurrent, and distributed systems; proficiency in system programming languages like Rust, C++, or Zig; and experience with Postgres, AWS, Redis, Kafka, Zipkin, or Jaeger to architect a robust, scalable backend that integrates seamlessly with novel hardware, edge datacenters, and API services.
North Star: cost per token and revenue capacity. Managing the operations and finances of a company that is rapidly scaling hardware infrastructure and obsessed with keeping customer costs as low as possible. Controlling the end-to-end flow of capital, from equity financing to debt leverage to capex to opex to pricing strategies and customer contracts. Building automated systems to scale to hundreds of millions in revenue with very few people.
North Star: time to value. Crafting and enhancing all aspects of developer tooling and experience—from CLIs, documentation, and libraries to demos and community engagement. Building automation to support millions of developers, leveraging a passion for improving the ease with which they can build, thereby fostering a vibrant developer community.
Our favorite local dev port is :8080, especially for side projects.
The Intel 8080 was the processor that ushered in the PC revolution.
Our mission is to put AI in every software loop. 8080 has a lot of loops.
Minimal characters for devs to type.
Symmetry. Visually appealing.
Ever since reading Neal Stephenson’s Reamde, we’ve always wanted to name a company with a number (like Corporation 9592).