Hacker News

(fil-c.org)

563 points by pizlonator about 22 hours ago

reactordev about 20 hours ago

Oh this is so cool

kerkeslager about 20 hours ago

I'm not sure I understand all of what they're doing there, but I did read the referenced Doligez-Leroy-Gonthier paper a while back and I am glad someone is doing something with that in a non-academic (maybe?) context. That paper looked promising to me when I read it, but I basically had no faith that it would ever make it out of academia because the algorithm is so complex. It took me a really long time to think I understood it, and it's one of those things I actually am not confident I could implement even when I understood it (I certainly don't understand it now).

pizlonator about 20 hours ago

I don’t think I’m the only one doing something in the vicinity of that paper. I think close relatives of DLG shipped in some prod JVMs.

kerkeslager about 19 hours ago

Interesting. I've read a lot about some complex JVMs, but I guess maybe they didn't cite their sources and I didn't make the connection on my own.

pcfwik about 20 hours ago

Given the goal is to work with existing C programs (which already have free(...) calls "carefully" placed), and you're already keeping separate bounds info for every pointer, I wonder why you chose to go with a full GC rather than lock-and-key style temporal checking[1]? The latter would make memory usage more predictable and avoid the performance overhead and scheduling headaches of a GC.

Perhaps storing the key would take too much space, or checking it would take too much time, or storing it would cause race condition issues in a multithreaded setting?

[1] https://acg.cis.upenn.edu/papers/ismm10_cets.pdf

pizlonator about 20 hours ago

I think the lock and key approaches don’t have Fil-C’s niftiest property: the capability model is totally thread safe and doesn’t require fancy atomics or locking in common cases

pcfwik about 20 hours ago

makes sense, thanks --- cool project!

pcfwik about 20 hours ago

Also find it interesting that you're allowing out-of-bounds pointer arithmetic as long as no dereference happens, which is a class of UB compilers have been known to exploit ( https://stackoverflow.com/questions/23683029/is-gccs-option-... ). Do you disable such optimizations inside LLVM, or does Fil-C avoid this entirely by breaking pointers into pointer base + integer offset (in which case I wonder if you're missing out on any optimizations that work specifically on pointers)?

pizlonator about 20 hours ago

For starters, llvm is a lot less willing to exploit that UB

It’s also weird that GCC gets away with this at all as many C programs in Linux that compile with GCC make deliberate use of out of bounds pointers.

But yeah, if you look at my patch to llvm, you’ll find that:

- I run a highly curated opt pipeline before instrumentation happens.

- FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.

- I made some surgical changes to clang CodeGen and some llvm passes to fix some obvious issues from UB

But also let’s consider what would happen if I hadn’t done any of that except for dropping UB flags in FilPizlonator. In that case, a pass before pizlonation would have done some optimization. At worst, that optimization would be a logic error or it would induce a Fil-C panic. FilPizlonator strongly limits UB to its “memory safe subset” by construction.

I call this the GIMSO property (garbage in, memory safety out).

kartoffelsaft about 19 hours ago

Not knowing the exact language used by the C standard, I suspect the reason GCC doesn't cause these issues with most programs is that the wording of "array object" refers specifically to arrays with compile-time-known sizes, i.e. `int arr[4]`. Most programs that do out of bounds pointer arithmetic are doing so with pointers from malloc/mmap/similar, which might have similar semantics to arrays but are not arrays.

pizlonator about 19 hours ago

Yes, I think you're right

AlotOfReading about 19 hours ago

    FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.

Does this work reliably or did your patches have to fix bugs here? There are LLVM bugs with floating point where backend doesn't properly respect passed attributes during codegen, which violate the behaviors of user level flags. I imagine the same thing exists for UB.

pizlonator about 19 hours ago

It works reliably.

LLVM is engineered to be usable as a backend for type-safe/memory-safe languages. And those flags are engineered to work right for implementing the semantics of those languages, provided that you also do the work to avoid other LLVM pitfalls (and FilPizlonator does that work by inserting aggressive checks).

Of course there could be a bug though. I just haven't encountered this particular kind of bug, and I've tested a lot of software (see https://fil-c.org/programs_that_work)

jandrewrogers about 20 hours ago

I think it is really cool that someone is going hard at this part of the design space from an engineering geek standpoint even though I can’t use it myself.

crawshaw about 20 hours ago

It is great that Fil-C exists. This is the sort of technique that is very effective for real programs, but that developers are convinced does not work. Existence proofs cut through long circular arguments.

johncolanduoni about 19 hours ago

What do the benchmarks look like? My main concern with this approach would be that the performance envelope would eliminate it for the use-cases where C/C++ are still popular. If throughput/latency/footprint are too similar to using Go or what have you, there end up being far fewer situations in which you would reach for it.

pizlonator about 19 hours ago

Some programs run as fast as normally. That's admittedly not super common, but it happens.

Some programs have a ~4x slowdown. That's also not super common, but it happens.

Most programs are somewhere in the middle.

> for the use-cases where C/C++ are still popular

This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive. It's written in C or C++ because:

- That's what it was originally written in and nobody bothered to write a better version in any other language.

- The code depends on a C/C++ library and there doesn't exist a high quality binding for that library in any other language, which forces the dev to write code in C/C++.

- C/C++ provides the best level of abstraction (memory and syscalls) for the use case.

Great examples are things like shells and text editors, where the syscalls you want to use are exposed at the highest level of fidelity in libc and if you wrote your code in any other language you'd be constrained by that language's library's limited (and perpetually outdated) view of those syscalls.

monkeyelite about 18 hours ago

You are making a lot of assumptions about my code.

pizlonator about 18 hours ago

I'm not meaning to. I've ported a lot of programs to Fil-C and I'm reacting to what I learn.

I am curious though. What assumptions do you think I'm making that you think are invalid?

monkeyelite about 18 hours ago

- that 4x would not impact user experience - that my code is on a Unix time sharing system - that I only use C or C++ because I inherited it - that Unix tools do not benefit from efficient programming because of syscalls - that multi-threaded garbage collection would be good for perf (assuming I’m not sharing the system)

pizlonator about 17 hours ago

You are posting on HN in a browser presumably. I am familiar with the stack of C/C++ code involved in that because I was a browser dev for 10+ years. Most of that code is definitely not perf sensitive in the sense that if you slowed it down by 4x, you might not notice most of the time

(Browser performance is like megapixels or megahertz … a number that marketing nerds can use to flex, but that is otherwise mostly irrelevant)

When I say 99% of the C code you use, I mean “use” as a human using a computer, not “use” as a dependency in your project. I’m not here to tell you that your C or C++ project should be compiled with Fil-C. I am here to tell you that most of the C/C++ programs you use as an end user could be compiled with Fil-C and you wouldn’t experience an degraded experience if that happened

mike_hearn about 14 hours ago

You're absolutely right about all of this.

People under-estimate how much code gets written in these languages just because decades ago they were chosen as the default language of the project and people are resistant to going full polyglot. Then everything gets written that way including cold paths, utilities, features that are hardly ever used, UI code that's bottlenecked by the network...

julieeee about 13 hours ago

>I am here to tell you that most of the C/C++ programs you use as an end user could be compiled with Fil-C and you wouldn’t experience an degraded experience if that happened

Since performance is largely correlated to battery life, of course I would notice. An Nx reduction in battery life would certainly be a degraded experience.

gf000 about 13 hours ago

This discussion is absolutely meaningless without specifying what kind of software we are talking about.

4x slowdown may be absolutely irrelevant in case of a software that spends most of its time waiting on IO, which I would wager a good chunk of user-facing software does. Like, if it has an event loop and does a 0.5 ms calculation once every second, doing the same calculation in 2 ms is absolutely not-noticeable.

For compilers, it may not make as much sense (not even due to performance reasons, but simply because a memory issue taking down the program would still be "well-contained", and memory leaks would not matter much as it's a relatively short-lived program to begin with).

And then there are the truly CPU-bound programs, but seriously, how often do you [1] see your CPU maxed out for long durations on your desktop PC?

[1] not you, pizlonator, just joining the discussion replying to you

monkeyelite about 11 hours ago

This IO bound myth is commonly repeated - yet most software executes in time many multiples above the IO work. Execution time is summed and using a language like C lets you better control your data and optimize IO resources.

gf000 about 11 hours ago

Well, software is not like a traditional Turing machine of having an input, buzzing a bit, and returning a response.

They are most commonly running continuously, and reacting to different events.

You can't do the IO work that depends on a CPU work ahead of time, and neither can you do CPU work that depends on IO. You have a bunch of complicated interdependencies between the two, and the execution time is heavily constrained by this directed graph. No matter how efficient your data manipulation algorithm is, if you still have to wait for it to load from the web/file.

Just draw a Gantt chart and sure, sum the execution time. My point is that due to interdependencies you will have a longest lane and no matter what you do with the small CPU parts, you can only marginally affect the whole.

It gets even more funny with parallelism (this was just concurrency yet), where a similar concept is named Amdahl's law.

And I would even go as far and claim that what you may win by C you often lose several-folds due to going with a simpler parallelism model for fear of Segfaults, which you could fearlessly do in a higher-level language.

otabdeveloper4 about 18 hours ago

All code is perf-sensitive.

Also, literally every language claims "only a x2 slowdown compared to C".

We still end up coding in C++, because see the first point.

pizlonator about 18 hours ago

I’m not claiming only 2x slowdown. It’s 4x for some of the programs I’ve measured. 4x > 2x. I’m not here to exaggerate the perf of Fil-C. I actually think that figuring out the true perf cost is super interesting!

> All code is perf-sensitive.

That can’t possibly be true. Meta runs on PHP/Hack, which are ridiculously slow. Code running in your browser is JS, which is like 40x slower than Yolo-C++ and yet it’s fine. So many other examples of folks running code that is just hella slow, way slower than “4x slower than C”

otabdeveloper4 about 15 hours ago

All code is perf-sensitive. Not all code is important enough to be written as we'd like it to be.

zelphirkalt about 13 hours ago

We don't like all code to be written in some C or C++ dialect.

gf000 about 13 hours ago

Then why use C? Take a look at actually perf-sensitive hot loops, and they are predominantly some inline assembly with a bunch of SIMD hacks, which can be 1000x times faster than C...

Sesse__ about 12 hours ago

FWIW, I just tested it on a random program I wrote recently, and it went from 2.085 seconds with Clang+jemalloc to 18.465 seconds with Fil-C. (No errors were reported, thank goodness!) So that's a 9x new worst case for you :-) It's basically a STL performance torture test, though. TBH I'm impressed that Fil-C just worked on the first try for this.

Sesse__ about 9 hours ago

And on the next one, a SIMD-heavy searcher thingie (where it found a real bug, though thankfully “only” reading junk data that would be immediately discarded!), it went from 7.723 to 379.56 seconds, a whopping 49x slowdown.

Dylan16807 about 16 hours ago

> All code is perf-sensitive.

I'm doing some for loops in bash right now that could use 1000x more CPU cycles without me noticing.

Many programs use negligible cycles over their entire runtime. And even for programs that spend a lot of CPU and need tons of optimizations in certain spots, most of their code barely ever runs.

> Also, literally every language claims "only a x2 slowdown compared to C".

I've never seen anyone claim that a language like python (using the normal implementation) is generally within the same speed band as C.

The benchmark game is an extremely rough measure but you can pretty much split languages into two groups: 1x-5x slowdown versus C, and 50x-200x slowdown versus C. Plenty of popular languages are in each group.

otabdeveloper4 about 15 hours ago

> I've never seen anyone claim that a language like python (using the normal implementation) is generally within the same speed band as C.

Live long enough and you will. People claimed it about PyPy back in the day when it was still hype.

Dylan16807 about 15 hours ago

Pypy is not the normal implementation. I was specifically excluding special implementations that only do part of a language and do it much faster. Especially with something like pypy that has extremely good best case scenarios, people can get too excited.

silisili about 18 hours ago

Super cool. What is your goal wrt performance? Is low 1.x-ish on average attainable, in your opinion?

pizlonator about 18 hours ago

I think that worst case 2x, average case 1.5x is attainable.

- Code that uses SIMD or that is mostly dealing with primitive data in large arrays will get to close to 1x

- Code that walks trees and graphs, like interpreted or compilers do, might end up north of 2x unless I am very successful at implementing all of the optimizations I am envisioning.

- Code that is IO bound or interactive is already close to 1x

johncolanduoni about 18 hours ago

While there are certainly other reasons C/C++ get used in new projects, I think 99% not being performance or footprint sensitive is way overstating it. There's tons of embedded use cases where a GC is not going to fly just from a code size perspective, let alone latency. That's mostly where I've often seen C (not C++) for new programs. Also, if Chrome gets 2x slower I'll finally switch back to Firefox. That's tens of millions of lines of performance-sensitive C++ right there.

That actually brings up another question: how would trying to run a JIT like V8 inside Fil-C go? I assume there would have to be some bypass/exit before jumping to generated code - would there need to be other adjustments?

kragen about 17 hours ago

Latency is the killer, I think. A GC can be on the order of 100 instructions.

pizlonator about 17 hours ago

It’s a concurrent GC. Latency won’t kill you

I’ll admit that if you are in the business of counting instructions then other things in Fil-C will kill you. Most of the overhead is from pointer chasing.

See https://fil-c.org/invisicaps

kragen about 17 hours ago

"Concurrent" doesn't usually mean "bounded in worst-case execution time", especially on a uniprocessor. Does it in this case?

InvisiCaps sound unbelievably amazing. Even CHERI hasn't managed to preserve pointer size.

gf000 about 13 hours ago

> "Concurrent" doesn't usually mean "bounded in worst-case execution time"

Sure, though this is also true for ordinary serial code, with all the intricate interactions between the OS scheduler, different caches, filesystem, networking, etc.

kragen about 13 hours ago

Usually when people care about worst-case execution time, they are running their code on a computer without caches and either no OS or an OS with a very simple, predictable scheduler. And they never access the filesystem (if there is one) or wait on the network (if there is one) in their WCET-constrained code.

Those are the environments that John upthread was talking about when he said:

> There's tons of embedded use cases where a GC is not going to fly just from a code size perspective, let alone latency. That's mostly where I've often seen C (not C++) for new programs.

But I've seen C++ there too.

If you're worried about the code size of a GC you probably don't have a filesystem.

gf000 about 11 hours ago

Well, there is a whole JVM implementation for hard real-time with a GC, that's used in avionics/military -- hard real time is a completely different story, slowness is not an issue here, you exchange fast execution for a promise of keeping a response time.

But I don't really think it's meaningful to bring that up as it is a niche of a niche. Soft-real time (which most people may end up touching, e.g. video games) are much more forgiving, see all the games running on Unity with a GC. An occasional frame drop won't cause an explosion here, and managed languages are more than fine.

kragen about 11 hours ago

Are you talking about Ovm https://dl.acm.org/doi/10.1145/1324969.1324974 https://apps.dtic.mil/sti/citations/ADA456895? pizlonator (the Fil-C author) was one of Ovm's authors 17 years ago. I don't think it's in current use, but hopefully he'll correct me if I'm wrong. The RTSJ didn't require a real-time GC (and IIRC at the time it wasn't known how to write a hard-real-time GC without truly enormous overheads) and it didn't have a real-time GC at the time. Perhaps one has been added since then.

I don't agree that "it is a niche of a niche". There are probably 32× as many computers in your house running hard-real-time software as computers that aren't. Even Linux used to disable interrupts during IDE disk accesses!

pizlonator about 6 hours ago

Yeah totally, if you're in those kinds of environments, then I agree that a GC is a bad choice of tech.

I say that even though, as you noticed in another reply, I worked on research to try to make GC suitable for exactly those environments. I had some cool demos, and a lot of ideas in FUGC come from that. But I would not recommend you use GC in those environments!

There is a way to engineer Fil-C to not rely on GC. InvisiCaps would work with isoheaps (what those embedded dudes would just call "object pools"). So, if we wanted to make a Fil-C-for-flight-software then that's what it would look like, and honestly it might even be super cool

pizlonator about 6 hours ago

> "Concurrent" doesn't usually mean "bounded in worst-case execution time", especially on a uniprocessor. Does it in this case?

Meh. I was in the real time GC game for a while, when I was younger. Nobody agrees on what it really means to bound the worst case. If you're a flight software engineer, it means one thing. If you're a game developer, it means something else entirely. And if you're working on the audio stack specifically, it means yet another thing (somewhere in between game and flight).

So let me put it this way, using the game-audio-flight framework:

- Games: I bound worst case execution time, just assuming a fair enough OS scheduler, even on uniprocessor.

- Audio: I bound worst case execution time if you have multiple cores.

- Flight: I don't bound worst case execution time. Your plane crashes and everyone is dead

kragen about 1 hour ago

Haha, yeah, I know.

johncolanduoni about 17 hours ago

For embedded use cases, it can definitely kill you. Small microcontrollers frequently have constant IPC for a given instruction stream and you regularly see simple for loops get used for timing.

yvdriess about 15 hours ago

There's tricks to improve the performance of pointer chasing on modern uarchs (cfr go's Greentea GC). You want to batch the address calculation/loading, deref/load and subsequent dependent ops like marking. Reorder buffers and load-store buffers are pretty big these days, so anything that breaks the addr->load->do dependency chain is a huge win, especially if there are any near that traverse loop.

rwmj about 16 hours ago

In the fast case allocations can be vastly cheaper than malloc, usually just a pointer decrement and compare. You'll need to ensure that your fast path never has the need to collect the minor heap, which can be done if you're careful. I hate this comparison that is always done as if malloc/free are completely cost-free primitives.

kragen about 16 hours ago

I agree, and I've written an allocator in C that works that way. The fast path is about 5 clock cycles on common superscalar processors, which is about 7–10× faster than malloc: http://canonical.org/~kragen/sw/dev3/kmregion.h

This is bottlenecked on memory access that is challenging to avoid in C. You could speed it up by at least 2× with some compiler support, and maybe even without it, but I haven't figured out how. Do you have any ideas?

Typically, though, when you are trying to do WCET analysis, as you know, you try to avoid any dynamic allocation in the time-sensitive part of the program. After all, if completing a computation after a deadline would cause a motor to catch fire or something, you definitely don't want to abort the computation entirely with an out-of-memory exception!

Some garbage collectors can satisfy this requirement just by not interfering with code that doesn't allocate, but typically not concurrent ones.

pizlonator about 17 hours ago

Most C/C++ code for old or new programs runs on a desktop or server OS where you have lots of perf breathing room. That’s my experience. And that’s frankly your experience too, if you use Linux, Windows, or Apple’s OSes

> how would trying to run a JIT like V8 inside Fil-C go?

You’d get a Fil-C panic. Fil-C wouldn’t allow you to PROT_EXEC lol

addaon about 17 hours ago

> Most C/C++ code for old or new programs runs on a desktop or server OS where you have lots of perf breathing room. That’s my experience. And that’s frankly your experience too, if you use Linux, Windows, or Apple’s OSes

What if I also use cars, and airplanes, and dishwashers, and garage doors, and dozens of other systems? At what point does most of the code I interact with /not/ have lots of breathing room? Or does the embedded code that makes the modern world run not count as "programs"?

pizlonator about 17 hours ago

You have a good point!

First of all, I’m not advocating that people use Fil-C in places where it makes no sense. I wouldn’t want my car’s control system to use it.

But car systems are big if they have 100 million lines of code or maybe a billion. But your desktop OS is at like 10 billion and growing! Throw in the code that runs in servers that you rely on and we might be at 100 billion lines of C or C++

pjmlp about 16 hours ago

Some of that is thankfully running Ada.

addaon about 7 hours ago

Not in my case.

johncolanduoni about 17 hours ago

Thanks for telling me what my experience is, but I can think of plenty of C/C++ code on my machine that would draw ire from ~all it's users if it got 2x slower. I already mentioned browsers but I would also be pretty miffed if any of these CPU-bound programs got 2x slower:

* Compilers (including clang)

* Most interpreters (Python, Ruby, etc.)

* Any simulation-heavy video game (and some others)

* VSCode (guess I should've stuck with Sublime)

* Any scientific computing tools/libraries

Sure, I probably won't notice if zsh or bash got 2x slower and cp will be IO bound anyway. But if someone made a magic clang pass that made most programs 2x faster they'd be hailed as a once-in-a-generation genius, not blown off with "who really cares about C/C++ performance anyway?". I'm not saying there's no place for trading these overheads for making C/C++ safer, but treating it as a niche use-case for C/C++ is ludicrous.

pjmlp about 16 hours ago

Many compilers are bootstrapped.

Ruby is partially written in Rust nowadays.

VSCode uses plenty of Rust and .NET AOT on its extensions, alongside C++, and more recently Webassembly, hence why it is the only Electron garbage with acceptable performance.

Unity and Unreal share a great deal of games, with plenty of C#, Blueprints, Verse, and a GC for C++.

zelphirkalt about 14 hours ago

Question is, whether one would really notice a slowdown of factor 2 in a browser. For example, if it takes some imaginary 2ms to close a tab, would one notice, if it now took 4ms? And for page rendering the bottleneck might be retrieving those pages.

saagarjha about 12 hours ago

Yes, people will absolutely notice. There's plenty of interactions that take 500ms that will now take a second.

const_cast about 10 hours ago

2 - 4 ms? No. The problem is that many web applications are already extremely slow and bogged down in the browser. 500 ms - 1s? Yes, definitely people will notice. Although that only really applies to React applications that do too much, network latency isn't affected.

spacechild1 about 14 hours ago

I would like to add:

* DAWs and audio plugins

* video editors

Audio plugins in particular need to run as fast as possible because they share the tiny time budget of a few milliseconds with dozens or even hundreds of other plugins instances. If everthing is suddenly 2x slower, some projects simply won't anymore in realtime.

conradev about 17 hours ago

I feel like code size, toolchain availability and the universality of the C ABI are more good reasons for why code is written in C besides runtime performance. I’d be curious how much overhead Fil-C adds from a code size perspective, though!

pizlonator about 6 hours ago

Code size overhead is really bad right now, but I wouldn't read anything into that other than "Fil didn't optimize it yet".

Reasons why it's stupidly bad:

- So many missing compiler optimizations (obviously those will also improve perf too).

- When the compiler emits metadata for functions and globals, like to support accurate GC and the stack traces you get on Fil-C panic, I use a totally naive representation using LLVM structs. Zero attempt to compress anything. I'm not doing any of the tricks that DWARF would do, for example.

- In many cases it means that strings, like names of functions, appear twice (once for the purposes of the linker and a second time for the purposes of my metadata).

- Lastly, an industrially optimized version of Fil-C would ditch ELF and just have a Fil-C-optimized linker format. That would obviate the need for a lot of the cruft I emit that allows me to sneakily make ELF into a memory safe linker. Then code size would go down by a ton

I wish I had data handy on just how much I bloat code. My totally unscientific guess is like 5x

pjmlp about 17 hours ago

Books like Zen of Assembly Programming exist, exactly because once upon a time, performance sensitive and C or C++ on the same sentence did not made any sense.

It is decades of backed optimisation work, some of which exploring UB based optimizations, that has made that urban myth possible.

As the .NET team discovered, and points out on each release since .NET 5 on lengthy blog posts able to kill most browsers buffers, if the team puts down as much work on the JIT and AOT compilers as the Visual C++ team, then performance looks quite different than what everyone else expects it naturally to be like.

ngrilly about 16 hours ago

You got me curious and I visited one of these .NET performance posts and indeed, it crashed my browser tab!

johncolanduoni about 3 hours ago

What is in theory possible in a language/runtime is often less important than historically contingent factors like which languages it’s easy to hire developers for that can achieve certain performance envelopes and which ecosystems have good tooling for micro-optimization.

In JS for example, if you can write your code as a tight loop operating on ArrayBuffer views you can achieve near C performance. But that’s only if you know what optimizations JS engines are good at and have a mental model how processors respond to memory access patterns, which very few JS developers will have. It’s still valid to note that idiomatic JS code for an arbitrary CPU-bound task is usually at least tens of times slower than idiomatic C.

aseipp about 16 hours ago

Chrome is not a good counter example a priori. It is a project that has hundreds of engineers assigned to it, some of them world-class security engineers, so they can potentially accept the burden of hardening their code and handling security issues with a regular toolchain. They've may have even evaluated such solutions already.

I think an important issue is that for performative sensitive C++ stuff and related domains, it's somewhat all or nothing with a lot of these tools. Like, a CAD program is ideally highly performant, but I also don't want it to own my machine if I load a malicious file. I think that's the hardest thing and there isn't any easy lift-and-shift solution for that, I believe.

I think some C++ projects probably could actually accept a 2x slowdown, honestly. Like I'm not sure if LibrePCB taking 2x as long in cycles would really matter. Maybe it would.

mike_hearn about 14 hours ago

Chrome is a bad example. It uses a tracing GC in its most performance sensitive parts explicitly to reduce the number of memory safety bugs (it's called Oilpan). And much of the rest is written in C++ simply because that's the language Chrome standardized on, they are comfortable relying on kernel sandboxes and IPC rather than switching to a more secure language.

wffurr about 8 hours ago

Chrome security is encouraging use of memory safe languages via the Rule of 2: https://chromium.googlesource.com/chromium/src/+/main/docs/s...

IIRC Crubit C++/Rust Interop is from the chrome team: https://github.com/google/crubit

mike_hearn about 6 hours ago

Memory safe languages aren't allowed in the Chrome codebase. Java is only for Android, Swift only for iOS/Mac, and Rust only for third party uses.

That might well change, but it's what their docs currently say.

steveklabnik about 6 hours ago

> That might well change, but it's what their docs currently say.

It's not, actually: https://source.chromium.org/chromium/chromium/src/+/main:doc...

> Rust can be used anywhere in the Chromium repository (not just //third_party) subject to current interop capabilities, however it is currently subject to a internal approval and FYI process. Googlers can view go/chrome-rust for details. New usages of Rust are documented at rust-fyi@chromium.org.

It is true that two years ago it was only third party, but it's been growing ever since.

pizlonator about 9 hours ago

> While there are certainly other reasons C/C++ get used in new projects, I think 99% not being performance or footprint sensitive is way overstating it.

Here’s my source. I’m porting Linux From Scratch to Fil-C

There is load bearing stuff in there that I’d never think of off the top of my head that I can assure you works just as well even with the Fil-C tax. Like I can’t tell the difference and don’t care that it is technically using more CPU and memory.

So then you’ve got to wonder, why aren’t those things written in JavaScript, or Python, or Java, or Haskell? And if you look inside you just see really complex syscall usage. Not for perf but for correctness. It code that would be zero fun to try to write in anything other than C or C++

reorder9695 about 5 hours ago

I have no credentials here but I'd be interested in knowing what environmental impact things like this (like relatively high overhead things like filc, vms, containers) as opposed to running optimised well designed code. I don't mean in regular project's, but in things specifically like the linux kernel that's potentially millions? billions? of computers

kragen about 1 hour ago

I wonder if something like LuaJIT would be an option. Certainly Objective-C would work.

hnlmorg about 16 hours ago

There’s loads of good shells and text editors written in other languages.

I’m the author of a shell written in Go and it’s more capable than Zsh.

kragen about 15 hours ago

Interesting, what's it called? How does it compare to Andy Chu's Oil?

hnlmorg about 5 hours ago

It’s called Murex: https://github.com/lmorg/murex

It’s a typed shell. So you can do jq-like data manipulation against a plethora of different documents. Unlike Zsh et al that are still ostensibly limited to just whitespace-delimited lists.

kragen about 1 hour ago

Interesting! Thanks for the link!

Sesse__ about 13 hours ago

> Great examples are things like shells and text editors

I regularly get annoyed that my shell and text editor is slow :-)

I do agree on principle, though.

IshKebab about 11 hours ago

> Some programs have a ~4x slowdown

How does it compare to something like RLBox?

> This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive.

I don't think that's true, or at best its a very contorted definition of "perf sensitive". Most code is performance sensitive in my opinion - even shitty code written in Python or Ruby. I would like it to be faster. Take Asciidoctor for example. Is that "performance sensitive"? Hell yes!

pizlonator about 9 hours ago

> How does it compare to something like RLBox?

I don’t know and it doesn’t matter because RLBox doesn’t make your C code memory safe. It only containerizes it.

Like, if you put a database in RLBox then a hacker could still use a memory safety bug to corrupt or exfiltrate sensitive data.

If you put a browser engine in RLBox then a hacker could still pwn your whole machine:

- If your engine has no other sandbox other than RLBox then they’d probably own your kernel by corrupting a buffer in memory that is being passed to a GPU driver (or something along those lines). RLBox will allow that corruption because the buffer is indeed in the program’s memory.

- If the engine has some other sandbox on top of RLbox then the bad guys will corrupt a buffer used for sending messages to brokers, so they can then pop those brokers. Just as they would without RLbox.

Fil-C prevents all of that because it uses a pointer capability model and enforces it rigorously.

So, RLbox could be infinity faster than Fil-C and I still wouldn’t care

IshKebab about 7 hours ago

That feels like a very binary view of security. There are certainly cases where something like RLBox takes you from "horrific anything-goes C security" to "probably fine". Image parsing for example, which is a common source of vulnerabilities.

So the question of performance is still relevant, even if RLBox's security properties are less tight.

pizlonator about 7 hours ago

I think you're trying to compare RLBox and Fil-C because you view them both as "add more security". I get it and that's not entirely unfair, but...

They're just way too different. RLBox is a containerization technology. Fil-C is a memory safety technology.

Like, there's a world where you would use both of them stacked on top of one another, because Fil-C does memory safety without containerizing while RLBox does containerization without memory safety.

IshKebab about 4 hours ago

Yeah, because they are both "add more security". I totally understand that they are different. But if you just want "more security" then they are both reasonable options to pick, and it makes sense to compare their performance.

It's like comparing apples and oranges - which I always found to be a weird saying because of course it makes sense to compare apples and oranges.

Edit: looked it up and apparently it was originally "apples and oysters" which makes way more sense

https://english.stackexchange.com/a/132907/114887

Although even then it isn't incorrect to compare them. Perhaps you have to choose a canapé, we could have tiny slices of delicious apples, or succulent oysters. But it's impossible to chose because there's no way to compare them arrrghhh!

I wonder what the "most different" two things are.

kragen about 2 hours ago

Apples and the abstract concept of the imaginary unit i?

baranul about 1 hour ago

Very good answer and agree. There seems to be a psychological component wrapped up in the mythology versus the reality of what's necessary.

pjmlp about 17 hours ago

That has been the qualm of programming since the Assembly days, unfortunately most developers aren't Ivan Suntherland, Alan Kay, Steve Jobs, Bret Victor, and other similar minded visionaries.

Most of us sadly cargo cult urban myths and only believe in what is running in front of them like Saint Thomas, as to have any kind of feeling how great some things could be like.

Hence why so many UNIX and C clones, instead of creating something new, to be honest those two guys were also visionaries despite some of the flaws, back in 1970's.

charleslmunger about 19 hours ago

This is really cool! I noticed

>The fast path of a pollcheck is just a load-and-branch.

A neat technique I've seen used to avoid these branches is documented at https://android-developers.googleblog.com/2023/11/the-secret... under "Implicit suspend checks".

pizlonator about 19 hours ago

Yeah that’s a common optimization for poll checks. I think most production JVMs do that.

I’m very far from doing those kinds of low level optimizations because I have a large pile of very high level and very basic optimizations still left to do!

titzer about 9 hours ago

We did it for MaxineVM back in the day, having the thread-local-storage point to itself and the safepoint as a load back into the same register. The problem is that that introduces a chain of dependent loads for all safepoints and for all operations that use thread-local storage. That seems like it would hurt OOE and IPC as a result.

I am working on adding threads to Virgil (slowly, in the background, heh). I'll use the simple load+branch from the TLS for the simple reason that the GC code is also written in Virgil and it must be possible to turn off safepoints that have been inserted into the GC code itself, which is easy and robust to do if they are thread-local.

o11c about 19 hours ago

Note that the "safepointing" logic is exactly the same thing that's needed in refcounting to atomic replace a field.

This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).

pizlonator about 19 hours ago

> Note that the "safepointing" logic is exactly the same thing that's needed in refcounting to atomic replace a field.

No it's not, not even close.

> This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).

Yeah, that part is hard, and maybe I'll describe it in another post.

Look for `filc_enter` and `filc_exit` in https://github.com/pizlonator/fil-c/blob/deluge/libpas/src/l...

Quitschquat about 18 hours ago

I love garbage collector design and impl. It’s one of those “go to” thing to do when learning a new language.

Never heard of this one, looking forward to diving in this weekend.

system2 about 18 hours ago

> Fil-C uses a parallel concurrent on-the-fly grey-stack Dijkstra accurate non-moving garbage collector called FUGC

Half of my hair turned white while trying to understand this.

gleenn about 18 hours ago

> The only "pause" threads experience is the callback executed in response to the soft handshake, which does work bounded by that thread's stack height.

So this is probably not great for functional/deeply-recursive code I guess?

pizlonator about 18 hours ago

Meh.

The stack scan is really fast. There's not a lot of logic in there. If you max out the stack height limit (megabytes of stack?) then maybe that means milliseconds of work to scan that stack. That's still not bad.

adastra22 about 18 hours ago

That's a very long time. Milliseconds of work is an entire frame update-render cycle in a modern game.

pizlonator about 18 hours ago

Would your modern game have a stack that is so deep that it brushes up against the stack height limit?

Probably not. Your game would be inches of stack away from crashing

debugnik about 15 hours ago

You're missing the point, they're giving an example of an entire workload that fits into your technique's worst-case overhead. It's could be the right trade-off and rarely be hit, but that worst-case does sound bad.

kristofferc about 14 hours ago

Actually, it sounds quite ok.

adastra22 about 14 hours ago

^ this was the intent of the example.

torginus about 14 hours ago

From what he describes, he uses stack maps to tell which stack values are pointers. He can skip over everything that's not a pointer.

On x86_64 you need about 10k function deep stack, all of them with the 14 GPs filled with pointers -to have an 1MB stack.

pizlonator about 7 hours ago

To play devil's advocate, the suckiest part about stack scanning is that it's a linked list walk. It's not a linear scan. So it's all pointer chasing. And it's very likely to find previously unmarked pointers, which involves CAS and other work.

(It would be a linear scan if I was just conservatively scanning, but then I'd have other problems.)

This is one of the most counterintuitive parts of GC perf! You'd think that the stack scan had to be a bottleneck, and it might even be one in some corner cases. But it's just not the longest pole in the tent most of the time, because you're so unlikely to actually have a 1MB stack, and programs that do have a 1MB stack tend to also have ginormous heaps (like many gigabytes), which then means that even if the stack scan is a problem it's not the problem.

kragen about 1 hour ago

You're writing the compiler, though, so you can define the stack layout. If the stack-scanning linked-list walk were the long pole, it wouldn't be hard to eliminate the pointer chasing: your procedure prologue could add a pointer to each newly pushed stack frame to something like a std::deque, then pop it off in the epilogue.

I don't know, maybe the fact that I'm disagreeing with someone who knows a lot more than I do about the issue should be a warning sign that I'm probably wrong?

pizlonator about 8 hours ago

Stacks tend to be small enough that the cost of scanning them is minuscule.

(I’m not trying to BS my way here - I’m explaining the reason why on the fly GC optimization almost never involves doing stuff about the time it takes to scan stack. It’s just not worth it. If it was, we’d be seeing a lot of stacklet type optimizations.)

munificent about 18 hours ago

Games don't tend to have very deep callstacks. And if a game cared about performance also wanted to use GC, it would probably try to run the GC at the end of a frame when there is little on the stack.

pizlonator about 18 hours ago

Yeah UE GC safepoints at end of tick where there is no stack. That’s a common trick in systems that have both GC and ticking.

To be fair, FUGC doesn’t currently let you do that. The GC runs in a separate thread and soft handshakes at various points, which cause your game thread to react at poll checks and exits that might not be at end of tick.

But I could add a feature that lets you to force handshake responses to be at end of tick! That sounds like a good idea

adastra22 about 17 hours ago

FUGC runs the GC in a separate thread and you don’t have a lot of control over when it interrupts.

kragen about 17 hours ago

Latency-sensitive programs like games are usually careful to avoid deep recursion.

AndyKelley about 18 hours ago

Super cool project. Sorry if you explained this already, I don't know what "Dijkstra accurate" means. How does it know if an object is truly available to be reclaimed, given that pointers can be converted to integers?

pizlonator about 18 hours ago

> given that pointers can be converted to integers?

Because if they get converted to integers and then stored to the heap then they lose their capability. So accesses to them will trap and the GC doesn’t need to care about them.

Also it’s not “Dijkstra accurate”. It’s a Dijkstra collector in the sense that it uses a Dijkstra barrier. And it’s an accurate collector. But these are orthogonal things

charleslmunger about 17 hours ago

Out of curiosity, does this idiom work in fil-c?

https://github.com/protocolbuffers/protobuf/blob/cb873c8987d...

      // This somewhat silly looking add-and-subtract behavior provides provenance
      // from the original input buffer's pointer. After optimization it produces
      // the same assembly as just casting `(uintptr_t)ptr+input_delta`
      // https://godbolt.org/z/zosG88oPn
      size_t position =
      (uintptr_t)ptr + e->input_delta - (uintptr_t)e->buffer_start;
      return e->buffer_start + position;

It does use the implementation defined behavior that a char pointer + 1 casted to uintptr is the same as casting to uintptr then adding 1.

pizlonator about 17 hours ago

Yeah that should just work

Code that strives to preserve provenance works in Fil-C

charleslmunger about 16 hours ago

Very cool. Hardware asan did not catch the pointer provenance bug in the previous implementation of that code because it relies on tag bits, and the produced pointer was bit-identical to the intended one. It sounds like fil-c would have caught it because the pointer capabilities are stored elsewhere.

kragen about 15 hours ago

What hardware do you need for hardware Asan? I'm so out of the loop that I haven't heard of it before.

saagarjha about 12 hours ago

TBI: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizer...

kragen about 12 hours ago

Thanks!

AndyKelley about 16 hours ago

Hmm, I'm still not understanding the bit of information that I'm trying to ask about.

Let's say I malloc(42) then print the address to stdout, and then do not otherwise do anything with the pointer. Ten minutes later I prompt the user for an integer, they type back the same address, and then I try to write 42 bytes to that address.

What happens?

Edit: ok I read up on GC literature briefly and I believe I understand the situation.

"conservative" means the garbage collector does not have access to language type system information and is just guessing that every pointer sized thing in the stack is probably a pointer.

"accurate" means the compiler tells the GC about pointer types, so it knows about all the pointers the type system knows about.

Neither of these are capable of correctly modeling the C language semantics, which allows ptrtoint / inttoptr. So if there are any tricks being used like xor linked lists, storing extra data inside unused pointer alignment bits, or a memory allocator implementation, these will be incompatible even with an "accurate" garbage collector such as this.

I should add, this is not a criticism, I'm just trying to understand the design space. It's a pretty compelling trade offer: give up ptrtoint, receive GC.

dan-robertson about 11 hours ago

I think the answer in your example is that when you cast the int into a pointer, it won’t have any capabilities (the other big Fil-C feature?) and therefore you can’t access memory through it.

pizlonator about 8 hours ago

Yes!

cgh about 4 hours ago

To expand on the capabilities thing: https://fil-c.org/invisicaps_by_example

In particular, check out the sections called "Laundering Pointers As Integers" and "Laundering Integers As Pointers".

AndyKelley about 1 hour ago

Thanks!

> This is because the capability is not stored at any addresses that are accessible to the Fil-C program.

How are they stored? Is the GC running in a different process?

shim__ about 9 hours ago

Pointers are always integers, which can be interpreted as pointers.

kragen about 18 hours ago

Hmm, Fil-C seems potentially really important; there's a lot of software that only exists in the form of C code which it's important to preserve access to, even if the tradeoffs made by conventional C compilers (accepting large risks of security problems in exchange for a small improvement in single-core performance) have largely become obsolete.

The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU? They're making all the right noises about capability security and nonblocking synchronization and whatnot.

Does anyone have experience using it in practice? I see that https://news.ycombinator.com/item?id=45134852 reports a 4× slowdown or better.

The name is hilarious. Feelthay! Feelthay!

pizlonator about 17 hours ago

> I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?

You could. That said, FUGC’s guts rely on OS features that in turn rely on an MMU.

But you could make a version of FUGC that has no such dependency.

As for perf - 4x is the worst case and that number is out there because I reported it. And I report worst case perf because that’s how obsessive I am about realistically measuring, and then fanatically resolving, perf issues

Fact is, I can live on the Fil-C versions of a lot of my favorite software and not tell the difference

kragen about 17 hours ago

Yeah, I meant to be clear that 4× was the worst case, and I think it's an impressive achievement already, and perfectly fine for almost everything. After all, running single-threaded software on an 8-core CPU is already an 8× slowdown, right? And people do that all the time!

What's the minimal code size overhead for FUGC?

pizlonator about 17 hours ago

> What's the minimal code size overhead for FUGC?

I don’t have good data on this.

The FUGC requires the compiler to emit extra metadata and that metadata is hilariously inefficient right now. I haven’t bothered to optimize it. And the FUGC implementation pulls in all of libpas even though it almost certainly doesn’t have to.

So I don’t know what minimal looks like right now

kragen about 17 hours ago

I see, thanks!

gr4vityWall about 8 hours ago

It's refreshing to see a honest, no-nonsense, not full of marketing-speak reply for once. Your replies on this thread have been great. :)

modeless about 17 hours ago

A Fil-C kernel that ran the whole system in the same address space, safely, would sure be something. Getting rid of the overhead of hardware isolation could compensate for some of the overhead of the software safety checks. That was the dream of Microsoft's Singularity project back in the day.

I guess there would be no way to verify that precompiled user programs actually enforce the security boundaries. The only way to guarantee safety in such a system would be to compile everything from source yourself.

pizlonator about 17 hours ago

You could have enforcement that binaries use Fil-C rules suing proof carrying code

kragen about 17 hours ago

I'm skeptical that PCC can work in practice with existing social practices around software development, because neither users nor implementors can tell the difference between a correct verifier and one that has flaws that aren't being actively exploited yet, but they can sure tell the difference between a fast system and a slow one.

The incentives are strong to drive the runtime cost as close to zero as possible, which involves making your proof-checking system so expressive that it's hard to get right. The closer you get to zero, the more performance-sensitive your userbase gets. No part of your userbase is actively testing the parts of your verifier that reject programs; they try to avoid generating programs that get rejected, and as the verifier gets larger and larger, it requires more and more effort to generate code that exploits one of its bugs, although there are more and more of them. As the effort required to slip malicious code past the verifier grows, the secret in-house tools developed by attackers gives them a larger and larger advantage over the defenders.

Continued "maintenance" applied to the verifier drive its size and complexity up over time, driving the probability of a flaw inexorably toward 100%, while, if it is not "maintained" through continuous changes, it will break as its dependencies change, it will be considered outdated, and nobody will understand it well enough to fix a bug if it does surface.

We've seen this happen with Java, and although it's clearly not unavoidable in any kind of logical sense, it's a strong set of social pressures.

Dynamic checking seems much more memetically fit: developers will regularly write code that should fail the dynamic checks, and, if it passes instead, they will send you an annoyed bug report about how they had to spend their weekend debugging.

zozbot234 about 14 hours ago

Proof carrying code is efficient at runtime; the verifier only ever has to run once to verify the binary, which in principle can simply be done at install. Even the verification itself can be fast enough because it's only checking the soundness of existing proofs, not having to generate new ones from scratch. Where there are genuine needs to "make the proof system more expressive", this can be done by adding trusted axioms which will generally be simple enough to manually check; the verifier itself need not become more complex.

We've seen a lot of complexity happen with Java but that was generally in the standard library facilities that are bundled with the language and runtime, not really the basic JVM type checking pass. Proof carrying code is closer to the latter.

kragen about 13 hours ago

I agree that the verifier performance is not important to runtime performance (though there have been some changes to the JVM in particular to speed up class loading), but the expressivity of the safety proofs it can check is, because safety properties that cannot be proven at load time must be checked at runtime. Initially, and I think still, the JVM's downcast on loading an object reference from a generic container like ArrayList<Price> is an example: javac has proven statically that it is safe, but the JVM bytecode cannot express that. Pointer nullability in JVM languages like Kotlin is another example: Kotlin knows most object pointers can't be null, but the JVM still has to check at runtime.

There have been numerous holes discovered in various implementations of the basic JVM type checking, often after existing for many years.

gf000 about 9 hours ago

I mean, since Gödel we pretty much have known that there could never be a system "without holes".

kragen about 9 hours ago

No, that is not correct. Gödel showed that some theorems are unprovable in a consistent formal axiomatic system; he did not show that no consistent formal axiomatic systems exist.

naasking about 8 hours ago

> Gödel showed that some theorems are unprovable in a consistent formal axiomatic system...

...of sufficient power, eg. that can model arithmetic with both addition and multiplication. I think the caveats are important because systems that can't fully model arithmetic are often still quite useful!

gf000 about 7 hours ago

Indeed! But I am afraid general purpose programming languages almost always need that kind of power (though being Turing complete is not necessary)

gf000 about 7 hours ago

He did show that every formal axiomatic system will have statements that can't be proven "from within".

For these, you are left with doing a runtime check.

kragen about 2 hours ago

Yes, that's true, but then possibly we are talking at cross-purposes; when I said "numerous holes discovered in various implementations of the basic JVM type checking", I didn't mean things that needed to be checked at runtime; I meant bugs that permitted violations of the JVM's security guarantees. However difficult it may be to avoid such things at a social level, certainly there is no mathematical reason that they must happen.

pizlonator about 6 hours ago

The JVM byte code situation isn’t a great example because that was a series of deliberate design choices for lots of complex reasons. And, the JVM absolutely can guarantee memory safety at the bytecode level. It’s just working with a slightly more dynamic type system than Java source.

What would happen if you tried to do PCC for InvisiCaps and FUGC is that it would ultimately constrain what optimizations are possible, because the optimizer would only be able to pick from the set of tricks that it could express a proof for within whatever proof system we picked

Totally not the end of the world.

Do I think this an interesting thing to actually do? I’m not sure. It’s certainly not the most interesting thing to do with Fil-C right now

kragen about 2 hours ago

Yes, I agree with you that a correct JVM can guarantee memory safety at the bytecode level, but what I meant to express was that many JVMs have had bugs that caused them to fail to do so, for essentially social reasons which I expect to cause problems with other PCC systems as well.

Maybe you're right, and those problems are not inevitable; for example, if you could reduce the proofs to a tiny MetaMath-like kernel that wouldn't need constant "maintenance". As you say, that could move the compiler's optimizer out of the TCB — at least for the security properties Fil-C is enforcing, though the optimizer could still cause code to compute the wrong answers.

That seems like something people would want if they were already living in a world where the state of computer security was much better than it is now.

miki123211 about 14 hours ago

This is what IBM I[1] (AKA AS400) does I think.

Ibm I applications are compiled to a hardware-independent intermediate representation called TIMI, which the SLIC (kernel) can then compile down to machine code, usually at program installation time. As the SLIC is also responsible for maintaining system security, there's no way for a malicious user to sneak in a noncompliant program.

[1] https://en.wikipedia.org/wiki/IBM_i

kragen about 14 hours ago

Correct, although I can't be sure I'm remembering the names of the parts correctly. Soltis's book Inside the AS/400 https://archive.org/details/insideas4000000solt is fascinating reading, but the title overpromises rather badly; there is no list of opcodes, for example.

pdw about 11 hours ago

I always wondered how secure AS/400 actually is. The original implementation might have checked tag bits in hardware (I don't know), but the current (PowerPC) implementation relies on the compiler generating a "check tag bits" every time a pointer is dereferenced [1]. So it seems that any arbitrary code execution vulnerability would be absolutely devastating. And the "SLIC" is not a small microkernel -- it also contains the compilers, the database and other system components. It'd be hard to believe there would no exploitable bugs in there.

[1] https://www.devever.net/~hl/ppcas

kragen about 10 hours ago

Yes, I agree.

ptx about 8 hours ago

That's basically the same idea as WebAssembly, isn't it?

zozbot234 about 8 hours ago

I don't think WebAssembly has been applied across a whole system just yet. Inferno/Limbo (the successor to Plan9, using the Dis virtual machine) may be substantially closer to the mark, along with AOSP (based on Dalvik/ART) and a variety of JavaScript-based "web" OS's. One may also argue that "image"-based systems like Smalltalk, Oberon etc. are in the same class, and that the lineage ultimately originates from Lisp machines.

kragen about 8 hours ago

Smalltalk predates Lisp machines and didn't originally compile to native code at all. I don't remember if Limbo did. Oberon isn't image-based (you can't save and restore the memory state of the running system) and didn't originally define a machine-independent bytecode format, and the one it had for many years has been removed from the current version. Wasm usually isn't image-based either; though it has a clear pathway for doing so, for example Wasmtime still doesn't implement that functionality: https://github.com/bytecodealliance/wasmtime/issues/3017

miki123211 about 7 hours ago

AS400 isn't image based either.

And unlike AS400, I don't think either Smalltalk or Lisp machines used the bytecode abstraction to achieve security.

kragen about 7 hours ago

Agreed.

willvarfar about 17 hours ago

When you run the Fil-C versions of your favourite software, does it have a sanitizer mode that reports bugs like missing free() etc? And have you found any bugs this way?

pizlonator about 17 hours ago

Well missing free is just swallowed by the GC - the leak gets fixed without any message.

I have found bugs in the software that I’ve ported, yeah.

purpleidea about 16 hours ago

This is neat work. I noticed on your software page some patches you had to make to get those things to compile. Have you sent those upstream? Eg, I noticed a simple 1 line bash change for example.

writebetterc about 15 hours ago

To add on top of this: This is a tracing GC. It only ever visits the live data, not the dead data. In other words, it would need a lot more special support if it wanted to report the dead objects.

kragen about 15 hours ago

Really? How does a non-moving GC make dead objects available for reallocation without visiting them?

torginus about 13 hours ago

Why would it need to visit them? It just marks the address ranges as available in its internal bookkeeping (bitmaps etc).

kragen about 13 hours ago

In the general case there are as many newly available address ranges as dead objects, so that counts as visiting them in this context.

torginus about 12 hours ago

I don't think that's a definition of 'visit' most people would agree with.

I'm actually working on my own language that has a non-moving GC. It uses size classes (so 16 byte objects, 32 byte objects etc.), each of which is allocated in a continous slab of memory. Occupancy is determined by a bitmap, 1 bit for each slot in the slab.

The GC constructs a liveness bitmap for the size class, and the results are ANDed together, 'freeing' the memory. If you fill the slab with dead objects, then run the GC, it will not walk anywhere on this slab, create an all zero liveness bitmap, and free the memory.

kragen about 12 hours ago

That's an awesome project! Is your GC generational despite being non-moving? What are your main objectives for the project?

The liveness bitmap approach is pretty widespread at this point; jemalloc works the same way IIRC.

Still, I think that counts as "visiting" in the context of this discussion: https://news.ycombinator.com/item?id=45137139

writebetterc about 11 hours ago

I don't think it counts as visiting, as you never look at the dirtied bitmap during GC, only during allocation. That means, you don't actually know if a dirty bit represents a different object or not (if a 16-byte size class is allowed to have 32-byte objs in it, for example). To know that you'd either have to have strict size classes, or you'd have to have object headers for specifying the start of an object.

I agree that it's easy to add in a visitation pass, where you take the bitmap of live objects after marking and diff it with the currently existing one in order to signal that you might have a leak.

So basically, I think we're like 99% in agreement.

kragen about 11 hours ago

It's always nice when the impact of collision of opposing opinions gives rise to the spark of mutual understanding rather than merely inflaming base emotions.

Typically bitmap-based allocators don't actually allow a 16-byte size class to have 32-byte objects in it, but I haven't looked at FUGC to see if that's true of it.

torginus about 10 hours ago

I toyed with the idea of allowing this, in bitmaps, it's pretty easy and efficient to find contiguous areas with bit twiddling hacks, for example

//assume free map is the bitmap where 1 means free

uint32_t free_map;

uint32_t free_map_2 = (free_map & (free_map >> 1)); // so on and so forth

I haven't really done anything like this yet, it has certain disadvantages, but you can pack multiple size classes into the same bitmap, you do a bit more work during alloc and resolving interior pointers is a bit more costly (if you have those), in exchange for having less size classes.

kragen about 10 hours ago

Sure, to find contiguous chunks of 6 slots within a single word, you can do

    t &= t << 1;
    t &= t << 2;
    t &= t << 2;

and that sort of thing is pretty appealing, but you lose the ability to know what size an object is just by looking at an address, and it's still a lot slower than scanning for an open slot in a page of 5× bigger objects.

Should I assume from your use of uint32_t that you're targeting embedded ARM microcontrollers?

pizlonator about 9 hours ago

FUGC is size segregated. 16 byte size class will only have 16 byte objects.

A bunch of other optimizations fall out from doing that

torginus about 11 hours ago

It's not generational, because unlike Java, but like C or C++, programs aren't supposed to generate a lot of ephemeral objects while they run. I also wanted to keep things as simple as possible to have a chance of actually shipping something in my lifetime :D

kragen about 10 hours ago

That sounds like a good approach! Is it public?

torginus about 10 hours ago

Not yet unfortunately, there are a few thorny issues, and I want to get it into an actually usable state before I dare make any claims about it :)

thomasmg about 12 hours ago

> there are as many newly available address ranges as dead objects

Well, when using a bitmap (as they seem to do in the article), then multiple subsequent dead objects are considered to be in the same range, because multiple subsequent bits in the bitmap have the value zero. There is no need to visit each zero bit in the bitmap separately.

thomasmg about 13 hours ago

If you want to use a standard malloc / free implementation (dlmalloc etc.) then dead object need to be known, yes.

But the malloc library could be fully integrated into the GC mechanism. In this case, there is no need. That's probably much easier, and faster, because the malloc can be simplified to bumping a pointer.

kragen about 13 hours ago

That works if you use a copying garbage collector, but not a non-moving collector like FUGC.

thomasmg about 12 hours ago

OK, I did not read the source code of the FUGC, but the article mentions "FUGC relies on a sweeping algorithm based on bitvector SIMD." So, assuming there is just one bit per block of memory, then there is no need to visit the memory of the dead objects in the sweep phase. The bit of the dead object is zero, and so that memory block is considered free and available for reallocation. There is no need to visit the free block.

kragen about 12 hours ago

It's true that it doesn't dirty up your dcache, but it's "visiting" enough that it wouldn't "need a lot more special support if it wanted to report the dead objects," which is the context of the discussion here.

tomp about 14 hours ago

A non-moving GC must visit dead objects.

writebetterc about 11 hours ago

I forgot that this GC is non-moving (I'm not used to that assumption, and it was a bit of a quick comment).

I do find the statement dubious still, do you mind clearing it up for me?

Given a page { void* addr; size_t size; size_t alignment; BitMap used; } where used's size in bits is page.size / page.alignment, surely we only need to visit the used bitmap for marking a memory slot as free?

kragen about 10 hours ago

Yes, I agree. (This thread continued in https://news.ycombinator.com/item?id=45137286.)

tomp about 9 hours ago

You’re correct, I forgot about that optimisation!

pizlonator about 9 hours ago

Not quite.

FUGC used a bit vector SIMD sweep using a bit vector on the side so it doesn’t visit the dead objects at all in the sense that it doesn’t touch their contents. And it only visits them in the sense that a single instruction deals with many dead or alive objects at once.

torginus about 14 hours ago

How would you go about writing a program/function that runs as close to native speed as possible on Fil-C?

How much more memory do GC programs tend to use?

Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?

You mention 'accurate' by which I assume you use the compiler to keep track of where the pointers are (via types/stackmaps).

How do you deal with pointers that get cast to ints, and then back?

pizlonator about 9 hours ago

> How would you go about writing a program/function that runs as close to native speed as possible on Fil-C?

Avoid pointer chasing. Use SIMD.

> How much more memory do GC programs tend to use?

I would estimate 2x

Fil-C has additional overheads not related to GC, so maybe it’s higher. I haven’t started measuring and optimizing memory use in anger.

> Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?

See https://fil-c.org/invisicaps

foldr about 13 hours ago

> As for perf - 4x is the worst case and that number is out there because I reported it

I love the concept of Fil-C but I find that with the latest release, a Fil-C build of QuickJS executes bytecode around 30x slower than a regular build. Admittedly this is an informal benchmark running on a GitHub CI runner. I’m not sure if virtualization introduces overheads that Fil-C might be particularly sensitive to (?). But I’ve sadly yet to see anything close to a 4x performance difference. Perhaps I will try running the same benchmark on native non-virtualized x86 later today.

Also, so I am not just whining, my Fil-C patch to the QuickJS main branch contains a fix for an issue that’s only triggered by regex backtracking, and which I think you might have missed in your own QuickJS patch:

http://github.com/addrummond/jsockd/blob/main/fil-c-quickjs....

kragen about 12 hours ago

I look forward to seeing how this shakes out. Fanatically, I hope?

pizlonator about 8 hours ago

30x? Oof

I know that I regressed quickjs recently when I fixed handling of unions. It’s a fixable issue, I just haven’t gone back and fixed it yet.

I definitely don’t see 30x overhead on anything else I run.

But thanks for pointing that out, I should probably actually fix the union handling the right way.

(What’s happening is every time quickjs bit casts doubles to pointers, that’s currently doing a heap allocation. And it’s obviously not needed. The simplest compiler analysis would kill it. I just turned off the previous instance of that analysis because it had a soundness issue)

foldr about 2 hours ago

Thanks for the response, that’s useful to know. It’s honestly amazing (to me) that Fil-C works at all, and I’m sure the performance will continue to improve.

ajb about 17 hours ago

"I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?"

Even if it worked for normal data flow, that's the sort of thing that's bound to introduce covert channels, I'd have thought. To start with I guess you have immediately disabled the mitigations of meltdown/spectre, because doesn't that happen when you switch processes?

kragen about 16 hours ago

Yes, it definitely will not work to plug covert channels or side-channel attacks like Spectre. Typically, computers without MMUs also don't have speculative execution, or in most cases even caches, so Spectre specifically wouldn't be relevant, but lots of other timing side channels would. Maybe other side channels like EMI and power consumption as well.

But consider, for example, decoding JPEG, or maybe some future successor to JPEG, JEEG, by the Joint Evil Experts Group. You want to look at a ransom note that someone in the JEEG has sent you in JEEG format so that you know how much Bitcoin to send them. You have a JEEG decoder, but it was written by Evil Experts, so it might have vulnerabilities, as JPEG implementations have in the past, and maybe the ransom note JEEG is designed to overflow a buffer in it and install a rootkit. Maybe the decoder itself is full of malicious code just waiting for the signal to strike!

If you can run the JEEG decoder in a container that keeps it from accessing the network, writing to the filesystem, launching processes, executing forever, allocating all your RAM, etc., only being permitted to output an uncompressed image, even if you let it read the clock, it probably doesn't matter if it launches some kind of side-channel attack against your Bitcoin wallet and your Bitchat client, because all it can do is put the information it stole into the image you are going to look at and then discard.

You can contrive situations where it can still trick you into leaking bits it stole back to the JEEG (maybe the least significant bits of the ransom amount) but it's an enormous improvement over the usual situation.

Then, FalseType fonts...

ajb about 4 hours ago

Well, they may not have speculative execution,but some of them do have branch prediction these days; which probably leaks a certain amount of information. Eg, the cortex M7 (no mmu,mpu optional, has branch prediction)

kragen about 2 hours ago

I think Cortex-M7 also often has split I+D caches?

odie5533 about 15 hours ago

SQLite in Rust https://github.com/tursodatabase/turso

CPython in Rust https://github.com/RustPython/RustPython

Bash in Rust https://github.com/shellgei/rusty_bash

kragen about 14 hours ago

Turso says:

> Warning: This software is ALPHA, only use for development, testing, and experimentation. We are working to make it production ready, but do not use it for critical data right now.

https://rustpython.github.io/pages/whats-left says:

> RustPython currently supports the full Python syntax. This is “what’s left” from the Python Standard Library.

Rusty_bash says:

> Currently, the binary built from alpha repo has passed 24 of 84 test scripts.

The CPython implementation is farther along than I had expected! I hope they make more progress.

Sammi about 13 hours ago

You're getting downvoted because nobody likes pedantry.

Especially for the Turso project if you look under "Insights -> Contributors" on their Github page, then it's clear that that project is under heavy active development, and they have an actual funded business startup that want's to sell access to a cloud version of Turso, so they are definitely incentivized to complete it.

Sqlite was built by three people, and has a stable and well defined interface and file format. This seems like an actual tractable project to re-implement if you have enough man years of funding and a talented enough dev team. Turso seems like they could fit the bill.

kragen about 12 hours ago

My comment is upvoted to +7 because it's not pedantry; it's just an assessment of the current state of rewriting those projects in Rust, which is that it hasn't happened yet.

In theory it could happen, and the Python project seems to be much closer than I had imagined was possible at this point. But it's not likely to.

Sammi about 11 hours ago

It was grey when I saw it.

odie5533 about 8 hours ago

With the advent of LLMs, I think it's more and more likely we'll see core C projects converted to Rust. People even like converting stuff to Rust for their own amusement.

pbronez about 4 hours ago

Maybe there's a world where we successfully get CPython re-written in Rust and that RustyPython becomes Python 4.

CuriouslyC about 11 hours ago

With improvements in coding agents, rewriting code in rust is pretty damn easy, and with a battle tested reference implementation, it should be easy to make something solid. I wouldn't be surprised if we have full rewrites of everything in rust in the next few years, just because it'll be so easy.

kragen about 10 hours ago

I have had better experiences with LLMs translating code from one language to another than writing code from scratch, but I don't think the current state of LLMs makes it "pretty damn easy" to rewrite code in Rust, especially starting from garbage-collected languages like Perl or Lua.

Certainly it's plausible that in the next few years it'll be pretty damn easy, but with the rapid and unpredictable development of AI, it's also plausible that humanity will be extinct or that all current programming languages will be abandoned.

CuriouslyC about 10 hours ago

In the last day I've rewritten two service hot cores in rust using agents, and gotten speedups from 4x to >400x (simd+precise memory management) and gotten full equivalent test coverage basically out of the gates from agent rewrites. So I'd say my experience has been overwhelmingly positive, and while I might be ahead of the curve in terms of AI engineering ability, this capability will come to everyone soon with better models/tools.

kragen about 10 hours ago

That's great! What's a service hot core? Which agents are you finding most useful, and what's the best way to take advantage of them? I was recently surprised to see Antirez recommend copying and pasting code back and forth to a "frontier model" instead of using agents: https://antirez.com/news/154

CuriouslyC about 9 hours ago

A service hot core is the part of the service where the vast majority of the computation takes place.

I have actually been on a similar workflow with frontier models for a while, but I have an oracle tool that agents can use which basically bundles up a covering set for whatever the agent is working on and feeds it to gemini/gpt5, and generates a highly detailed plan for the agent to follow. It helps a ton with agents derpily exploring code bases, accidentally duplicating functionality, etc.

I use claude code mostly because the economics of the plan are unbeatable if you can queue a lot of work efficiently. Most of the agents are pretty good now if you use sonnet, the new deepseek is pretty good too. I have my own agent written in rust which is kicking the pants off stuff in early tests, once I get an adapter to let me use it with the claude code plan in place I'm going to switch over. Once my agent stabilizes and I have a chance to do some optimization runs on my tool/system prompts the plan is to crush SWEbench :P

kragen about 9 hours ago

That's really exciting! Can you run DeepSeek locally?

CuriouslyC about 8 hours ago

Maybe if you quantize the everliving hell out of it, but I wouldn't. Depending on how much ram you have I'd run qwen coder, gemma3 or if you can go bigger (huge workstation with research class gpu) I'd go gpt-oss120 or GLM air.

pizlonator about 9 hours ago

I don’t buy it but let’s say that in the best case this happens.

Then we’ll have a continuation of the memory safety exploit dumpster fire because these Rust ports tend to use a significant amount of unsafe code.

On the other hand, Fil-C has no unsafe escape hatches.

Think of Fil-C as the more secure but slower/heavier alternative to Rust

kragen about 8 hours ago

Hmm, maybe this should be on the project's homepage: recompiling with Fil-C is a more secure but slower and more-memory-consuming alternative to rewriting in Rust.

pizlonator about 8 hours ago

I want to write a detailed post about the strength of Fil-C’s memory safety guarantee at some point. I haven’t yet thought of a sufficiently precise way and sufficiently classy way to say it.

kragen about 8 hours ago

Does Epic upper management have an opinion?

CuriouslyC about 8 hours ago

By default you are right. However you can use static analysis and tooling guardrails to reject certain classes of unsafe code automatically, and force the agent to go back to the drawing board. It might take a few tries and a tiny amount of massaging but I don't doubt it'd get there.

pizlonator about 8 hours ago

If you could get there that way with Rust’s unsafe blocks then you could get there that way with C++

CuriouslyC about 7 hours ago

probably true. I think thing doing the heavy lifting is an adversarial loop on the generated code to red team it repeatedly before merge.

yndoendo about 7 hours ago

Note the power of SQLite being written in C is the portability to non standard OSes. [0] I've used on an embedded real-time μC/OS-II variant. [1]

Architecture of embedded solutions is different than desktop and server. Example, to prevent memory from fragmenting and high performance, do not free it. Mark that memory (object / struct) as reusable. It is similar to customized heap allocation or pooling.

[0] https://sqlite.org/vfs.html [1] https://en.wikipedia.org/wiki/Micro-Controller_Operating_Sys...

kragen about 7 hours ago

Probably worth pointing out that Fil-C doesn't yet support 32-bit systems (or presumably 16-bit or 8-bit systems): https://fil-c.org/invisicaps

justin66 about 7 hours ago

> There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

How many years away are we from having AI-enhanced static analysis tools that can accurately look at our C code (after the fact or while we're writing it) and say "this will cause problems, here's a fix" with a level of accuracy sufficient that we can just continue using C?

kragen about 2 hours ago

I don't think anybody can predict that.

scottlamb about 5 hours ago

> The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

Interestingly, I agree with your point in general that there's a lot of software that Fil-C might be a good fit for, but I hesitate to say that about any of the examples you listed:

* CPython and Perl5 are the runtimes for notoriously slow GCed languages, and adding the overhead of a second GC seems...inelegant at best, and likely to slow things down a fair bit more.

* Some of them do have reimplementations or viable alternatives in Rust (or Go or the like) underway, like Turso for SQLite.

* More generally, I'd call these foundational, widely-used, actively maintained pieces of software, so it seems plausible to me that they will decide to RiiR.

I think the best fit may be for stuff that's less actively maintained and less performance-critical. There's 50 years of C programs that people still dig out of the attic sometime but aren't putting that much investment into and are running on hardware vastly more powerful than these programs were written for.

kragen about 1 hour ago

Yeah, for that reason perhaps Perl5 is a better example than CPython, but something less widely used might be a better example. tcsh, say.

tangled about 17 hours ago

This feels like one of those rather rare projects that is both sailing pretty close to research, and also yielding industrially useful results -- I love it! Normally I'd expect to see something like this coming out of one of the big tech companies, where there are enough advertising megabucks to pay a small team to work on a project like this (providing someone can make the business case...). So I'm curious: what was the initial motivation for this work? Assuming this is not a passion project, who is funding it? How many person years of work has this involved? What is the end game?

latchkey about 17 hours ago

> What is the end game?

Culonavirus about 14 hours ago

"Filip Pizlo, senior director of language engineering at Epic Games, has created his own memory-safe flavor of C and – because why not?"

https://www.theregister.com/2024/11/16/rusthaters_unite_filc...

Good stuff!

kragen about 13 hours ago

This link should be at the top of the thread!

thorn about 12 hours ago

Awesome link, thanks for the much needed context

saagarjha about 12 hours ago

I think this is a passion project.

pizlonator about 7 hours ago

Yes

RobertEva about 16 hours ago

Pretty wild to see a concurrent, non-moving GC strapped onto plain C. If I can take a mid-size C codebase and trade ~2–3× runtime for fewer memory footguns, I’d take it. How rough is incremental adoption—per target, or all-in toolchain?

discord9 about 15 hours ago

This is very cool, also I wonder does fugc works without InvisiCap, what would it be like without the capability pointer, does it just became not accurate but still usable?

CMay about 14 hours ago

I love C, performance and security. Between this garbage collector and the capability enforcement, this is appealing. I've thought a few times about what a more secure C would look like, brushing over capability concepts here and there, but I don't belong near compiler code.

How hard would it be to support Windows?

lucideer about 14 hours ago

<off-topic> Took me far too long to understand your opening sentence; eventually realised the Oxford comma would've helped me out. Rare to see a clear example in the wild.

falcor84 about 14 hours ago

I would have also accepted a colon.

kccqzy about 14 hours ago

That would change the meaning.

falcor84 about 13 hours ago

Why, what do you mean? My understanding was that it's a list of subjects of affection, so for clarity, I would rewrite

> I love C, performance and security.

> I love: C, performance and security.

How would it change the meaning? What am I missing?

CMay about 13 hours ago

Your colon does not really solve ambiguity. They likely thought you meant "I love C: performance and security." which would make it more explicit, but it would also change the meaning and thus be incorrect.

If I thought it was ambiguous enough to really try to fix, what I would probably default to is an elaboration. For example: "I love the intersection of C, performance and security." The meaning is not exactly the same, but it is more the same than your colon.

falcor84 about 12 hours ago

I suppose my reading was essentially as "These are a few of my favorite things", i.e. that I love each of them, and it would be quite nice if I could enjoy all of them at the same time, but it's not that I only love their intersection.

CMay about 14 hours ago

The potential for misinterpretation didn't escape me, but I was also confident that there was sufficient context to understand it. An Oxford comma there would have suggested a more careful cadence if it were spoken, which is not what I wanted.

As you said, it can be rare to see a case where it truly is ambiguous, but the context here negates that well enough.

gaanbal about 13 hours ago

wow, this is really fugcing cool

synergy20 about 13 hours ago

will fil-c support ARM soon,for that matter, risc-v? safety is super important for embedded devices running c or c++

classified about 12 hours ago

The description left me confused. Is this something that has to be integrated into the compiler?

jitl about 8 hours ago

It’s an implementation of a very C compatible language called FilC. A fork of LLVM to add the Pizlonator compiler pass combined with runtime support stuff.

VagabundoP about 11 hours ago

Related, this is the Python garbage collector episode:

https://www.podscan.fm/podcasts/corepy/episodes/episode-21-a...

I found it very interesting.

illuminator83 about 11 hours ago

IMHO Garbage collection is and always was an evolutionary dead end. No matter how nice you make it, it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

And because of that it always involves some sort of hidden runtime cost which might bite you eventually and makes it unusable for many tasks.

I'd rather have my resource management verified at compile time and with no runtime overhead. That this is possible is proven by multiple languages now.

That being said, I can imagine some C programs for which using Fil-C is an acceptable trade-off because they just won't be rewritten in language that is safer anytime soon.

zozbot234 about 11 hours ago

> IMHO Garbage collection is and always was an evolutionary dead end. No matter how nice you make it, it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

There are problem domains where tracing garbage collection simply cannot be avoided. This is essentially always the case when working with problems that involve constructing arbitrary spaghetti-like reference graphs, possibly with cycles. There's no alternative in that case to "making a mess" and dealing with it as you go, because that requirement is inherent in the problem itself.

It would be interesting to have a version of Fil-C that could work with a memory-safe language like Rust to allow both "safe" leaf references (potentially using ownership and reference counting to represent more complex allocated objects, but would not themselves "own" pointers to Fil-C-managed data, thus avoiding the need to trace inside these objects and auto-free them) and general Fil-C managed pointers with possible cycles (perhaps restricted to some arena/custom address space, to make tracing and collecting more efficient). Due to memory safety, the use of "leaf" references could be ensured not to alter the invariants that Fil-C relies on; but managed pointers would nonetheless be available whenever GC could not be avoided.

nurettin about 10 hours ago

> There are problem domains where tracing garbage collection simply cannot be avoided.

Can you expound on that? I've been doing this for a while and haven't seen such a domain yet.

zozbot234 about 10 hours ago

Many problems in GOFAI are typical examples. It's no coincidence that tracing garbage collection was originally developed in connection with LISP, the standard language of GOFAI.

nurettin about 7 hours ago

Really? I've implemented plenty of minimax models in the early 2000s, but the idea of using GC never came up.

dan-robertson about 10 hours ago

Garbage collection performance lies along a memory-overhead vs time-overhead curve which can be tuned. Manual memory management sits at one end (minimal memory overhead) but is often slower than garbage collection if you are willing to accept some space overhead. Observe that the most performance sensitive programs worry a lot about allocation whether or not they use garbage collection.

foldr about 9 hours ago

>I'd rather have my resource management verified at compile time and with no runtime overhead

Malloc and free have runtime overhead — sometimes more than the overhead of garbage collection.

The only way to have no overhead is to statically allocate fixed sized buffers for everything.

gwbas1c about 9 hours ago

Engineering is about tradeoffs.

When I write in Rust, the process uses very little RAM. BUT, I often spend a lot of time working through ownership issues and other syntax sugar to prove that memory is cleaned up correctly.

When I write in garbage collected languages, I can move a lot faster. Yes, the process uses more RAM, but I can finish a lot more quickly. Depending on the program that I'm writing, finishing quickly may be more important than using as little RAM as possible.

Furthermore, "which is better" isn't always clear. If you're relying on reference counting (smart pointers; or ARC or RC in Rust), you could actually spend more CPU cycles maintaining the count than an optimized garbage collector will spend finding free memory.

(IE, you spend a lot of time working in a RAM efficient language only to end up with a program that trades off RAM efficiency for CPU efficiency. Or even worse, you might miss your window for building a prototype or testing a feature because you became obsessed with a metric that just doesn't matter.)

These are very critical tradeoffs to understand when you make statements like "Garbage collection is and always was an evolutionary dead end," "it feels wrong to make a mess and have some else clean it up inefficiently at some point later," and "hidden runtime cost".

(Remember, sometimes maintaining a reference count uses more CPU than an optimized garbage collector.)

titzer about 9 hours ago

The topic has been debated for decades. It's your opinion but it's pretty reductionist and basically religious one at this point. All memory management has costs, both at compile time, run time, and cognitive overhead. You can move costs around, reduce them, and avoid some, but you'll never get away from it.

> it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

Yet no one argues for manual register allocation anymore and will gleefully use dozens or even hundreds of locals and then thousands of functions, just trusting the compiler to sort it all out.

We make progress by making the machines implement our nice abstractions.

brazzy about 8 hours ago

> IMHO Garbage collection is and always was an evolutionary dead end.

An evolutionary dead end that is used by probably upwards of 90% of all productively running code, and is the subject of lots of active research, as evidenced by TFA?

> No matter how nice you make it, it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

> And because of that it always involves some sort of hidden runtime cost

Because of what? Your feelings?

Nick0981 about 11 hours ago

Just skimmed through the docs - this is super interesting. Love the idea of a minimal GC with no runtime dependencies. Anyone here tested it with something more intense than toy benchmarks? Curious how it handles edge cases under memory pressure or concurrency. Also, would be great to hear how it compares to Boehm or Rust’s drop model in real-world workloads.

maxuser about 2 hours ago

+1 to seeing a comparison to Boehm. I remember that being all the rage at the time, but then stopped hearing about it. I recall Boehm being a conservative collector, whereas Fil-C appears to be precise.

dan-robertson about 11 hours ago

I’m curious how expensive the write barrier is in practice? IIRC, it is often write barriers that most affect performance sensitive programs running on non-concurrent garbage collectors (and perhaps safepoints that can cause problems for performance sensitive threads when running with a concurrent gc).

gok about 11 hours ago

How hard would it be to add generational collection?

xd1936 about 9 hours ago

Great name. Big RUNK energy[1].

1. https://twitter.com/6thgrade4ever/status/1433519577892327424

gwbas1c about 9 hours ago

I skimmed the article.

I'm curious about CPU / RAM overhead tradeoffs.

IE, in may cases GC is more CPU efficient than reference counting, but the tradeoff is higher RAM consumption.

1vuio0pswjnm7 about 8 hours ago

Yet another useful project using musl not glibc

https://wiki.musl-libc.org/projects-using-musl

loa_in_ about 8 hours ago

I'm amazed by how readable the source is, the names meaningful and not filled with jargon.

aidenn0 about 7 hours ago

How can the GC be precise when C and C++ allow casting between pointers and integers?

jcul about 3 hours ago

This looks pretty amazing, I'm surprised I haven't heard of it before. Looking forward to trying it out. Seems like a good way to verify the safety of some programs, even if not feasible for production due to performance constraints. Though we have sanitizers for tests, this seems more complete.

program_whiz 19 minutes ago

Can anyone explain how the roots are known to the GC, I can't spot it. Does it have some kind of precompile step to mark roots for GC scan?

@pizlonator ?