New Aarch64 Back End | Svelte Hacker News

conorbergin a day ago

Zig is becoming a real do-everything tool for me. I was learning gpu programming recently and found you could compile it to SPIR-V!

AndyKelley a day ago

Oh yeah, everyone is sleeping on this use case. Plenty of work to go before we can advise people to try it out yet, but being able to share packed structs, enums, and compile-time logic across CPU and GPU code is going to be, quite literally, a game changer.
- miki123211 21 hours ago
  
  Is this just for graphics programming, or are ML / scientific computing use cases (compiling to Cuda / SASS) considered too?
- sgt a day ago
  
  Forgive my ignorance but what would be some practical use cases for this, for someone who hasn't been doing any GPU programming to understand? I guess Machine Learning?
  - wiz21c a day ago
    
    currently, when you want to write a sahder, you have to use a specific language for that (gl/sl or some other). That's a pain because that language is C-like and most likely pretty close to your host language (C,Rust,Zig whatever). Moreover, you must pass information to these shaders (uniforms) and it means you have to write code to copy from you haost lang data structures to the data strctures of the shaders (which , once again, are pretty close to those in your host language). Shader languages don't usually have "import" mechanism, so building with them is painful. And their syntax is very light, so having syntactic sugar coming from the host language would be cool.
    So yeah, writing shader in something else than Gl/SL, wgsl would make our life so much easier...
- andyfleming a day ago
  
  Is that similar to the functionality they're targeting with Mojo?

MuffinFlavored a day ago

What's the motivation to avoid LLVM backends?

mananaysiempre a day ago

> In exchange [for eliminating the dependency on LLVM], Zig gains these benefits:
> All our bugs are belong to us.
> The compiler becomes trivial to build from source and to bootstrap with only a C compiler on the host system.
> We stop dealing with annoying problems introduced by Linux distributions and package managers such as Homebrew related to LLVM, Clang, and LLD. There have been and continue to be many.
> The Zig compiler binary goes from about 150 MiB to 5 MiB.
> Compilation speed is increased by orders of magnitude.
> [...]
https://github.com/ziglang/zig/issues/16270
- donio a day ago
  
  The Go toolchain is a nice illustration of this approach working in practice. It fully bootstraps in 90 seconds on my aging laptop and since it's fully self-hosted it doesn't even need a C compiler unless you want cgo support.
  LLVM takes 2 hours to build on the same host and zig (with the LLVM backend) is another 20 minutes. It will be awesome if that can be brought down to 2 minutes or less.
  - AndyKelley a day ago
    
    Is that building Go with Go? Or actual bootstrapping? Check this out...
    Building Zig with Zig:
    andy@bark ~/s/zig (master)> time zig build ________________________________________________________ Executed in 11.67 secs fish external
    Bootstrapping with only a C compiler dependency (not even make or shell!):
    andy@bark ~/s/zig (master)> time cc -o bootstrap bootstrap.c; and time ./bootstrap ________________________________________________________ Executed in 55.10 millis fish external gcc -o zig-wasm2c stage1/wasm2c.c -O2 -std=c99 ./zig-wasm2c stage1/zig1.wasm zig1.c gcc -o zig1 zig1.c stage1/wasi.c -std=c99 -Os -lm ./zig1 lib build-exe -ofmt=c -lc -OReleaseSmall --name zig2 -femit-bin=zig2.c -target x86_64-linux --dep build_options --dep aro -Mroot=src/main.zig -Mbuild_options=config.zig -Maro=lib/compiler/aro/aro.zig ./zig1 lib build-obj -ofmt=c -OReleaseSmall --name compiler_rt -femit-bin=compiler_rt.c -target x86_64-linux -Mroot=lib/compiler_rt.zig gcc -o zig2 zig2.c compiler_rt.c -std=c99 -O2 -fno-stack-protector -Istage1 -Wl,-z,stack-size=0x10000000 -pthread ________________________________________________________ Executed in 305.06 secs fish external
    
    donio a day ago
    
    > Is that building Go with Go? Or actual bootstrapping?
    Normally it's just Go with Go. Besides the Go compiler you need bash if you want to use the normal bootstrap script but not much else. You can build your way up from C by building an old enough version of Go that was still C based but that's not usually done these days.
    > Executed in 11.67 secs
    Nice!
  - cxr a day ago
    
    Sometime after Minix 3 but before it had attained the critical mass for a self-sustaining community, compilation times went from 10 minutes on low-end hardware to ~3 hours, and the answer to the question "Why?" was "LLVM/clang".
- wiz21c a day ago
  
  maybe I'm wrong, but rust is still using LLVM. When I see that list of benefits, I wonder why rust is still on LLVM... (honest question, I use rust everyday and I'm happy with it, except for compilation times :-) )
  - zozbot234 21 hours ago
    
    Rust has cranelift as a natively bootstrapped alternative these days. No different from Golang or Zig.
    
    MuffinFlavored 5 hours ago
    
    When will it become the default?
    What does it say about LLVM project that everybody starts off on it and then gets off of it?
  - rowanG077 21 hours ago
    
    Rust depends heavily on the llvm optimization pipeline afaik. So it would be a heavy investment to write native backends.
- remindmeagain a day ago
  
  Does this mean `zig c++` is going away with LLVM dropped? That would be a shame - so useful.
  - flohofwoe a day ago
    
    AFAIK the current plan for zig cc is outlined here:
    https://github.com/ziglang/zig/issues/20875
    E.g. LLVM will still be around one way or another, just not as deeply integrated as it is now.
  - forrestthewoods a day ago
    
    As long as Zig builds the glibc shim libraries those can be used in a separate Clang build system.
    It’d be nice if Zig added a command to emit them more explicitly to use in non-Zig build systems. Would be awesome actually. Plz?
    
    remindmeagain a day ago
    
    I'm trying to wrap my mind around what you wrote. Do you envision a different binary separate from zig?
    The closest thing I could find online is: https://stackoverflow.com/questions/78892396/how-to-link-to-...
    
    forrestthewoods a day ago
    
    Cross-compiling is super super easy conceptually. All you need is headers and an import library.
    Windows is trivial to cross compile for because there is a single set of headers. And for DLLs Windows compiler like to generate an import lib. Which is basically a stub function for all exports.
    Linux is stupid and sucks. Linux doesn’t have a single set of headers because programmers in the 80s didn’t know better. And Linux linking to a shared library expects you to have a full copy of the library you expect to exist at runtime. Which is fucking stupid and ass backwards. But I digress.
    So. Why is Zig awesome for crosscompiling C++? Because Zig fixed Linux sucking ass. Zig moves mountains to generate thin “import libs” of the half dozen .so files you need to link against to run on Linux.
    If you want to cross compile C++ for Linux all you need is clang++ executable, headers, and those stub .so files. Zig generates the libs implicitly as part of its zig cc build process. I’m asking for the half dozen (C) or dozen (C++) libs to be explicitly exportable by Zig. So that they can be trivially used in other build systems.
    I’ve got a custom polyglot build system that leverages Zig. But I have to zig cc a hello_world.cpp and parse the output to get the paths to those libs. Which is annoying and janky.
    Hopefully that helps?
    
    TUSF a day ago
    
    > Linux is stupid and sucks.
    None of this is Linux's fault. I'd argue this is both C's and GNU's fault. The need for a copy of the library, seems to be just a tooling convention. GCC's linker requires it, so others do the same thing. The executable itself doesn't need to know where a given symbol is in a shared library's memory at runtime (or even what shared library a symbol comes from, just that it's inside one of the libraries declared as needed in its dynamic section), because that's the (runtime) linker's job to figure out. Regardless, you don't actually need a full copy of the library—a stub will suffice.
    I don't know a ton about compilers, but as far as I know, there's no reason clang's linker (LLD) couldn't just not require an existing copy of the shared library you're linking to. Can't do anything about requiring a version of the libc headers you need for every platform though.
    
    forrestthewoods a day ago
    
    You’re totally wrong. Linux doesn’t suck. Merely all the tools you are required to use on Linux suck. Which for all intents and purposes means Linux sucks. IMHO.
    
    forrestthewoods a day ago
    
    I meant you’re NOT totally wrong. Sorry about that!
    
    rstat1 a day ago
    
    >> Linux doesn’t have a single set of headers because programmers in the 80s didn’t know better.
    I would argue that it does, but because its apparently illegal (obvious exaggeration alert) to package all the parts of a single entity together in one package, it just seems like it doesn't, where as on Windows there's a single package (the Windows SDK) that contains the majority of the relevant stuff.
    I do however 100% agree with you on linking to shared libraries. The way Linux compilers handle that is fucking stupid.
    
    forrestthewoods a day ago
    
    I wish you were correct.
    https://github.com/ziglang/zig/tree/master/lib/libc/include
    There is one set of headers for Windows. One set for macOS. And a fucking kajillion different headers for Linux.
    And of course glibc is a fucking dumpster fire that doesn’t let you trivially target and arbitrary version.
    Grumble grumble
    :(
    
    remindmeagain a day ago
    
    Yes it does. Thank you.
tw061023 a day ago

LLVM is basically a resource pool for C++ compiler development. As such, it is highly C++ specific and leaks C++ semantics everywhere.
It's especially funny when this happens in Rust, which is marketed as a "safer" alternative.
Would you like a segfault out of nowhere in safe Rust? The issue is still open after two years by the way: https://github.com/rust-lang/rust/issues/107975
- saghm a day ago
  
  It's not clear to me what you mean by default with regards to that issue. As far as I can tell, there's not really any indication that this is undefined behavior. Yes, there seems to be to a bug of some sort in the code being generated, but it seems like a stretch to me to imply that any bug that generates incorrect code is necessarily a risk of UB. Maybe I'm missing some assumption being made about what the pointers not being equal implies, but given that you can't actually dereference `*const T` in safe Rust, I don't really see where you're able to draw the conclusion that having two of them incorrectly not compare as equal could lead to unsafety.
  - tux3 12 hours ago
    
    If you read the Github issue, this one was weaponized fairly straightforwardly by taking the difference between the two pointers.
    The difference is zero, but the compiler thinks it is non-zero because it thinks they are unequal.
    From there you turn it into type confusion through an array, and then whatever you want. Almost any wrong compiler assumption can be exploited. This particular way to do it has also been used several times to exploit bugs in Javscript engines.
- ncruces a day ago
  
  Yeah, using LLVM for anything trying to avoid UB is crazy.
  I got involved in a discussion with a Rust guy when trying to get C with SIMD intrinsics into wasi-libc where something that the C standard explicitly state is “implementation defined” (and so, sane, as we're targeting a single implementation - LLVM) can't be trusted, because LLVM may turn it back into UB because “reasons.”
  At this point Go and Zig made the right choice to dump it. I don't know about Rust.
  https://github.com/WebAssembly/wasi-libc/pull/593
  - AndyKelley a day ago
    
    It sounds like you have a fundamental misunderstanding about undefined behavior. It's easy to emit LLVM IR that avoids undefined behavior. The language reference makes it quite clear what constitutes undefined behavior and what does not.
    The issue is that frontends want to emit code that is as optimizeable as possible, so they opt into the complexity of specifying additional constraints, attributes, and guarantees, each of which risks triggering undefined behavior if the frontend has a bug and emits wrong information.
    
    ncruces 19 hours ago
    
    Hi Andy. Did you read the linked thread?
    I was not the one making this claim:
    > However, I believe that currently, there is no well-defined way to actually achieve this on the LLVM IR level. Using plain loads for this is UB (even if it may usually work out in practice, and I'm sure plenty of C code just does that).
    My claim is that the below snippet is implemention defined (not UB):
    // Casting through uintptr_t makes this implementation-defined, // rather than undefined behavior. uintptr_t align = (uintptr_t)s % sizeof(v128_t); const v128_t *v = (v128_t *)((uintptr_t)s - align);
    Further, that this is actually defined by the implementation to do the correct thing, by any good faith reading of the standard:
    > The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
    I further suggested laundering the pointer with something like the below, but was told it would amount to nothing, again the blame being put on LLVM:
    asm ("" : "+r"(v))
    I honestly don't know if LLVM or clang should be to blame. I was told LLVM IR and took it in good faith.
    
    AndyKelley 13 hours ago
    
    No, I hadn't read the linked thread until you prodded me. Now I have and I understand the situation entirely. I'll give a brief overview; feel free to ask any followup questions.
    A straightforward implementation of memchr, i.e. finding the index of a particular byte inside an array of bytes, looks like this:
    for (bytes, 0..) |byte, i| { if (byte == search) return i; } return null;
    This is trivial to lower to well-defined LLVM IR.
    But it's desirable to use tricks to make the function really fast, such as assuming that you can read up to the page boundary with SIMD instructions[1]. This is generally true on real world hardware, but this is incompatible with the pointer provenance memory model, which is load-bearing for important optimizations that C, C++, Rust, and Zig all rely on.
    So if you want to do such tricks you have to do it in a black box that is exempt from the memory model rules. The Zig code I link to here is unsound because it does not do this. An optimization pass, whether it be implemented in Zig pipeline or LLVM pipeline, would be able to prove that it writes outside a pointer provenance, mark that particular control flow unreachable, and thereby cause undefined behavior if it happens.
    This is not really LLVM's fault. This is a language shortcoming in C, C++, Rust, Zig, and probably many others. It's a fundamental conflict between the utility of pointer provenance rules, and the utility of ignoring that crap and just doing what you know the machine allows you to do.
    [1]: https://github.com/ziglang/zig/blob/0.14.1/lib/std/mem.zig#L...
    
    ncruces 7 hours ago
    
    Thanks for taking the time!
    I was the original contributor of the SIMD code, and got this… pushback.
    I still don't quite understand how you can marry ”pointer provenance” with the intent that converting between pointers and integers is “to be consistent with the addressing structure of the execution environment” and want to allow DMA in your language, but then this is UB.
    But well, a workable version of it got submitted, I've made subsequent contributions (memchr, strchr, str[c]spn…), all good.
    Just makes me salty on C, as if I needed more reasons to.
    
    AndyKelley 4 hours ago
    
    That's totally fair to be salty about a legitimately annoying situation. But I think it's actually an interesting, fundamental complexity of computer science, as opposed to some accidental complexity that LLVM is bringing to the table.
- pjmlp a day ago
  
  Which is why nowadays most frontends have been migrating to MLIR, and there is also ongoing work for clang as well.
  - AndyKelley a day ago
    
    How does migrating to MLIR address the problem?
    
    pjmlp a day ago
    
    The higher abstraction level it provides over the LLVM IR, making language frontends and compiler passes less dependent on its semantics.
    
    alexrp a day ago
    
    As the guy currently handling Zig's LLVM upgrades, I do not see this as an advantage at all. The more IR layers I have to go through to diagnose miscompilations, the more of a miserable experience it becomes. I don't know that I would have the motivation to continue doing the upgrades if I also had to deal with MLIR.
    
    pjmlp a day ago
    
    LLVM project sees that otherwise, and the adoption across the LLVM community is quite telling where they stand.
    
    alexrp a day ago
    
    That doesn't seem like a good argument for why Zig ought to target MLIR instead of LLVM IR. I think I'd like to see some real-world examples of compilers for general-purpose programming languages using MLIR (ClangIR is still far from complete) before I entertain this particular argument.
    
    pjmlp 19 hours ago
    
    Would Flang do it? Fortran was once general purpose.
    https://github.com/llvm/llvm-project/blob/main/flang/docs/Hi...
    Maybe the work in Swift (SIL), Rust (MIR), Julia (SSAIR) that were partially the inspiration for MLIR, alongside work done at Google designing Tensorflow compiler?
    The main goal being an IR that would accomodate all use cases of those high level IRs.
    Here are the presentation talk slides at European LLVM Developers Meeting back in 2019,
    https://llvm.org/devmtg/2019-04/slides/Keynote-ShpeismanLatt...
    Also you can find many general purpose enough users around this listing,
    https://mlir.llvm.org/users/
    
    pklausler 16 hours ago
    
    Are you saying that Fortran was once a general purpose programming language, but somehow changed to no longer be one?
    
    pjmlp 15 hours ago
    
    Yes, because we are no longer in the 1960's - 1980's.
    C and C++ took over many of the use cases people where using Fortran for during those decades.
    In 2025, while it is a general purpose language, its use is constrained to scientific computing and HPC.
    Most wannabe CUDA replacements keep forgetting Fortran is one of the reasons scientific community ignored OpenCL.
    
    pklausler 14 hours ago
    
    So you're saying that the changes made to Fortran have made it more specialized?
    
    AndyKelley a day ago
    
    Huh?? That can only make frontends' jobs more tricky.
    
    pjmlp a day ago
    
    Yet is being embraced by everyone since its introduction in 2019, with its own organization and conference talks.
    So maybe all those universities, companies and the LLVM project know kind of what they are doing.
    - https://mlir.llvm.org/
    - https://llvm.github.io/clangir/
    - https://mlir.llvm.org/talks/
    
    AndyKelley a day ago
    
    No need to make a weird appeal to authority. Can you just explain the answer to my question in your own words?
    
    marcelroed 21 hours ago
    
    I am only familiar with MLIR for accelerator-specific compilation, but my understanding is that by describing operations at a higher level, you don’t need the frontend to know what LLVM IR will lead to the best final performance. For instance you could say "perform tiled matrix multiplication" instead of "multiply and add while looping in this arbitrary indexing pattern", and an MLIR pass can reason about what pattern to use and take whatever hints you’ve given it. This is especially helpful when some implementations should be different depending on previous/next ops and what your target hardware is. I think there’s no reason Zig can’t do something like this internally, but MLIR is an existing way to build primitives at several different levels of abstraction. From what I’ve heard it’s far from ergonomic for compiler devs, though…
    
    pjmlp 19 hours ago
    
    You see it as appeal to authority, I see it as the community of frontend developers, based on Swift and Rust integration experience, and work done by Chris Lattner, while working at Google, feedbacking into what the evolution of LLVM IR is supposed to look like.
    Mojo and Flang, were designed from scratch using MLIR, as there are many other newer languages on the LLVM ecosystem.
    I see it as the field experience of folks that know a little bit more than I ever will about compiler design.
ethan_smith 19 hours ago

Zig's self-hosted compiler reduces compile times, enables better cross-compilation, allows finer control over codegen optimizations, and eliminates the large LLVM dependency which simplifies distribution and bootstrapping.
sakras a day ago

Likely performance - LLVM is somewhat notorious for being slower than ideal.

ummonk a day ago

I’m a little confused reading this. Is MIR short for “machine intermediate representation” instead of “medium intermediate representation”? I generally expect IRs to be relatively platform-independent but it sounds like the “MIR” here is close to Aarch64 binary?

TUSF a day ago

Zig has a couple of IR layers. Generally, Zig's compiler goes: AST → ZIR → AIR, and from there it'll either emit LLVM bitcode or one of its own, platform-specific "Machine IR"
sakras a day ago

Likely refers to Machine IR, a lower level representation that normal LLVM IR lowers to?
cogman10 a day ago

In rust it's mid-level IR.
Here's what that all means in rust terms [1]. I assume zig doing something similar.
[1] https://blog.rust-lang.org/2016/04/19/MIR/

EricRiese a day ago

+28,242 −3,737

LGTM

webdevver 14 hours ago

honestly those numbers should be the other way around. extra 25k for an aarch64 backend...
actually kind of sus. they should be able to do it in 10kloc. the cpu does most of the heavy lifting anyway. i certainly hope they aren't scheduling insns or doing something silly like that!