Perhaps storing the key would take too much space, or checking it would take too much time, or storing it would cause race condition issues in a multithreaded setting?
It’s also weird that GCC gets away with this at all as many C programs in Linux that compile with GCC make deliberate use of out of bounds pointers.
But yeah, if you look at my patch to llvm, you’ll find that:
- I run a highly curated opt pipeline before instrumentation happens.
- FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.
- I made some surgical changes to clang CodeGen and some llvm passes to fix some obvious issues from UB
But also let’s consider what would happen if I hadn’t done any of that except for dropping UB flags in FilPizlonator. In that case, a pass before pizlonation would have done some optimization. At worst, that optimization would be a logic error or it would induce a Fil-C panic. FilPizlonator strongly limits UB to its “memory safe subset” by construction.
I call this the GIMSO property (garbage in, memory safety out).
FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.
Does this work reliably or did your patches have to fix bugs here? There are LLVM bugs with floating point where backend doesn't properly respect passed attributes during codegen, which violate the behaviors of user level flags. I imagine the same thing exists for UB.LLVM is engineered to be usable as a backend for type-safe/memory-safe languages. And those flags are engineered to work right for implementing the semantics of those languages, provided that you also do the work to avoid other LLVM pitfalls (and FilPizlonator does that work by inserting aggressive checks).
Of course there could be a bug though. I just haven't encountered this particular kind of bug, and I've tested a lot of software (see https://fil-c.org/programs_that_work)
Some programs have a ~4x slowdown. That's also not super common, but it happens.
Most programs are somewhere in the middle.
> for the use-cases where C/C++ are still popular
This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive. It's written in C or C++ because:
- That's what it was originally written in and nobody bothered to write a better version in any other language.
- The code depends on a C/C++ library and there doesn't exist a high quality binding for that library in any other language, which forces the dev to write code in C/C++.
- C/C++ provides the best level of abstraction (memory and syscalls) for the use case.
Great examples are things like shells and text editors, where the syscalls you want to use are exposed at the highest level of fidelity in libc and if you wrote your code in any other language you'd be constrained by that language's library's limited (and perpetually outdated) view of those syscalls.
I am curious though. What assumptions do you think I'm making that you think are invalid?
(Browser performance is like megapixels or megahertz … a number that marketing nerds can use to flex, but that is otherwise mostly irrelevant)
When I say 99% of the C code you use, I mean “use” as a human using a computer, not “use” as a dependency in your project. I’m not here to tell you that your C or C++ project should be compiled with Fil-C. I am here to tell you that most of the C/C++ programs you use as an end user could be compiled with Fil-C and you wouldn’t experience an degraded experience if that happened
People under-estimate how much code gets written in these languages just because decades ago they were chosen as the default language of the project and people are resistant to going full polyglot. Then everything gets written that way including cold paths, utilities, features that are hardly ever used, UI code that's bottlenecked by the network...
Since performance is largely correlated to battery life, of course I would notice. An Nx reduction in battery life would certainly be a degraded experience.
4x slowdown may be absolutely irrelevant in case of a software that spends most of its time waiting on IO, which I would wager a good chunk of user-facing software does. Like, if it has an event loop and does a 0.5 ms calculation once every second, doing the same calculation in 2 ms is absolutely not-noticeable.
For compilers, it may not make as much sense (not even due to performance reasons, but simply because a memory issue taking down the program would still be "well-contained", and memory leaks would not matter much as it's a relatively short-lived program to begin with).
And then there are the truly CPU-bound programs, but seriously, how often do you [1] see your CPU maxed out for long durations on your desktop PC?
[1] not you, pizlonator, just joining the discussion replying to you
They are most commonly running continuously, and reacting to different events.
You can't do the IO work that depends on a CPU work ahead of time, and neither can you do CPU work that depends on IO. You have a bunch of complicated interdependencies between the two, and the execution time is heavily constrained by this directed graph. No matter how efficient your data manipulation algorithm is, if you still have to wait for it to load from the web/file.
Just draw a Gantt chart and sure, sum the execution time. My point is that due to interdependencies you will have a longest lane and no matter what you do with the small CPU parts, you can only marginally affect the whole.
It gets even more funny with parallelism (this was just concurrency yet), where a similar concept is named Amdahl's law.
And I would even go as far and claim that what you may win by C you often lose several-folds due to going with a simpler parallelism model for fear of Segfaults, which you could fearlessly do in a higher-level language.
Also, literally every language claims "only a x2 slowdown compared to C".
We still end up coding in C++, because see the first point.
> All code is perf-sensitive.
That can’t possibly be true. Meta runs on PHP/Hack, which are ridiculously slow. Code running in your browser is JS, which is like 40x slower than Yolo-C++ and yet it’s fine. So many other examples of folks running code that is just hella slow, way slower than “4x slower than C”
I'm doing some for loops in bash right now that could use 1000x more CPU cycles without me noticing.
Many programs use negligible cycles over their entire runtime. And even for programs that spend a lot of CPU and need tons of optimizations in certain spots, most of their code barely ever runs.
> Also, literally every language claims "only a x2 slowdown compared to C".
I've never seen anyone claim that a language like python (using the normal implementation) is generally within the same speed band as C.
The benchmark game is an extremely rough measure but you can pretty much split languages into two groups: 1x-5x slowdown versus C, and 50x-200x slowdown versus C. Plenty of popular languages are in each group.
Live long enough and you will. People claimed it about PyPy back in the day when it was still hype.
- Code that uses SIMD or that is mostly dealing with primitive data in large arrays will get to close to 1x
- Code that walks trees and graphs, like interpreted or compilers do, might end up north of 2x unless I am very successful at implementing all of the optimizations I am envisioning.
- Code that is IO bound or interactive is already close to 1x
That actually brings up another question: how would trying to run a JIT like V8 inside Fil-C go? I assume there would have to be some bypass/exit before jumping to generated code - would there need to be other adjustments?
I’ll admit that if you are in the business of counting instructions then other things in Fil-C will kill you. Most of the overhead is from pointer chasing.
Those are the environments that John upthread was talking about when he said:
> There's tons of embedded use cases where a GC is not going to fly just from a code size perspective, let alone latency. That's mostly where I've often seen C (not C++) for new programs.
But I've seen C++ there too.
If you're worried about the code size of a GC you probably don't have a filesystem.
But I don't really think it's meaningful to bring that up as it is a niche of a niche. Soft-real time (which most people may end up touching, e.g. video games) are much more forgiving, see all the games running on Unity with a GC. An occasional frame drop won't cause an explosion here, and managed languages are more than fine.
I don't agree that "it is a niche of a niche". There are probably 32× as many computers in your house running hard-real-time software as computers that aren't. Even Linux used to disable interrupts during IDE disk accesses!
I say that even though, as you noticed in another reply, I worked on research to try to make GC suitable for exactly those environments. I had some cool demos, and a lot of ideas in FUGC come from that. But I would not recommend you use GC in those environments!
There is a way to engineer Fil-C to not rely on GC. InvisiCaps would work with isoheaps (what those embedded dudes would just call "object pools"). So, if we wanted to make a Fil-C-for-flight-software then that's what it would look like, and honestly it might even be super cool
Meh. I was in the real time GC game for a while, when I was younger. Nobody agrees on what it really means to bound the worst case. If you're a flight software engineer, it means one thing. If you're a game developer, it means something else entirely. And if you're working on the audio stack specifically, it means yet another thing (somewhere in between game and flight).
So let me put it this way, using the game-audio-flight framework:
- Games: I bound worst case execution time, just assuming a fair enough OS scheduler, even on uniprocessor.
- Audio: I bound worst case execution time if you have multiple cores.
- Flight: I don't bound worst case execution time. Your plane crashes and everyone is dead
This is bottlenecked on memory access that is challenging to avoid in C. You could speed it up by at least 2× with some compiler support, and maybe even without it, but I haven't figured out how. Do you have any ideas?
Typically, though, when you are trying to do WCET analysis, as you know, you try to avoid any dynamic allocation in the time-sensitive part of the program. After all, if completing a computation after a deadline would cause a motor to catch fire or something, you definitely don't want to abort the computation entirely with an out-of-memory exception!
Some garbage collectors can satisfy this requirement just by not interfering with code that doesn't allocate, but typically not concurrent ones.
> how would trying to run a JIT like V8 inside Fil-C go?
You’d get a Fil-C panic. Fil-C wouldn’t allow you to PROT_EXEC lol
What if I also use cars, and airplanes, and dishwashers, and garage doors, and dozens of other systems? At what point does most of the code I interact with /not/ have lots of breathing room? Or does the embedded code that makes the modern world run not count as "programs"?
First of all, I’m not advocating that people use Fil-C in places where it makes no sense. I wouldn’t want my car’s control system to use it.
But car systems are big if they have 100 million lines of code or maybe a billion. But your desktop OS is at like 10 billion and growing! Throw in the code that runs in servers that you rely on and we might be at 100 billion lines of C or C++
* Compilers (including clang)
* Most interpreters (Python, Ruby, etc.)
* Any simulation-heavy video game (and some others)
* VSCode (guess I should've stuck with Sublime)
* Any scientific computing tools/libraries
Sure, I probably won't notice if zsh or bash got 2x slower and cp will be IO bound anyway. But if someone made a magic clang pass that made most programs 2x faster they'd be hailed as a once-in-a-generation genius, not blown off with "who really cares about C/C++ performance anyway?". I'm not saying there's no place for trading these overheads for making C/C++ safer, but treating it as a niche use-case for C/C++ is ludicrous.
Ruby is partially written in Rust nowadays.
VSCode uses plenty of Rust and .NET AOT on its extensions, alongside C++, and more recently Webassembly, hence why it is the only Electron garbage with acceptable performance.
Unity and Unreal share a great deal of games, with plenty of C#, Blueprints, Verse, and a GC for C++.
* DAWs and audio plugins
* video editors
Audio plugins in particular need to run as fast as possible because they share the tiny time budget of a few milliseconds with dozens or even hundreds of other plugins instances. If everthing is suddenly 2x slower, some projects simply won't anymore in realtime.
Reasons why it's stupidly bad:
- So many missing compiler optimizations (obviously those will also improve perf too).
- When the compiler emits metadata for functions and globals, like to support accurate GC and the stack traces you get on Fil-C panic, I use a totally naive representation using LLVM structs. Zero attempt to compress anything. I'm not doing any of the tricks that DWARF would do, for example.
- In many cases it means that strings, like names of functions, appear twice (once for the purposes of the linker and a second time for the purposes of my metadata).
- Lastly, an industrially optimized version of Fil-C would ditch ELF and just have a Fil-C-optimized linker format. That would obviate the need for a lot of the cruft I emit that allows me to sneakily make ELF into a memory safe linker. Then code size would go down by a ton
I wish I had data handy on just how much I bloat code. My totally unscientific guess is like 5x
It is decades of backed optimisation work, some of which exploring UB based optimizations, that has made that urban myth possible.
As the .NET team discovered, and points out on each release since .NET 5 on lengthy blog posts able to kill most browsers buffers, if the team puts down as much work on the JIT and AOT compilers as the Visual C++ team, then performance looks quite different than what everyone else expects it naturally to be like.
In JS for example, if you can write your code as a tight loop operating on ArrayBuffer views you can achieve near C performance. But that’s only if you know what optimizations JS engines are good at and have a mental model how processors respond to memory access patterns, which very few JS developers will have. It’s still valid to note that idiomatic JS code for an arbitrary CPU-bound task is usually at least tens of times slower than idiomatic C.
I think an important issue is that for performative sensitive C++ stuff and related domains, it's somewhat all or nothing with a lot of these tools. Like, a CAD program is ideally highly performant, but I also don't want it to own my machine if I load a malicious file. I think that's the hardest thing and there isn't any easy lift-and-shift solution for that, I believe.
I think some C++ projects probably could actually accept a 2x slowdown, honestly. Like I'm not sure if LibrePCB taking 2x as long in cycles would really matter. Maybe it would.
IIRC Crubit C++/Rust Interop is from the chrome team: https://github.com/google/crubit
That might well change, but it's what their docs currently say.
It's not, actually: https://source.chromium.org/chromium/chromium/src/+/main:doc...
> Rust can be used anywhere in the Chromium repository (not just //third_party) subject to current interop capabilities, however it is currently subject to a internal approval and FYI process. Googlers can view go/chrome-rust for details. New usages of Rust are documented at rust-fyi@chromium.org.
It is true that two years ago it was only third party, but it's been growing ever since.
Here’s my source. I’m porting Linux From Scratch to Fil-C
There is load bearing stuff in there that I’d never think of off the top of my head that I can assure you works just as well even with the Fil-C tax. Like I can’t tell the difference and don’t care that it is technically using more CPU and memory.
So then you’ve got to wonder, why aren’t those things written in JavaScript, or Python, or Java, or Haskell? And if you look inside you just see really complex syscall usage. Not for perf but for correctness. It code that would be zero fun to try to write in anything other than C or C++
It’s a typed shell. So you can do jq-like data manipulation against a plethora of different documents. Unlike Zsh et al that are still ostensibly limited to just whitespace-delimited lists.
How does it compare to something like RLBox?
> This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive.
I don't think that's true, or at best its a very contorted definition of "perf sensitive". Most code is performance sensitive in my opinion - even shitty code written in Python or Ruby. I would like it to be faster. Take Asciidoctor for example. Is that "performance sensitive"? Hell yes!
I don’t know and it doesn’t matter because RLBox doesn’t make your C code memory safe. It only containerizes it.
Like, if you put a database in RLBox then a hacker could still use a memory safety bug to corrupt or exfiltrate sensitive data.
If you put a browser engine in RLBox then a hacker could still pwn your whole machine:
- If your engine has no other sandbox other than RLBox then they’d probably own your kernel by corrupting a buffer in memory that is being passed to a GPU driver (or something along those lines). RLBox will allow that corruption because the buffer is indeed in the program’s memory.
- If the engine has some other sandbox on top of RLbox then the bad guys will corrupt a buffer used for sending messages to brokers, so they can then pop those brokers. Just as they would without RLbox.
Fil-C prevents all of that because it uses a pointer capability model and enforces it rigorously.
So, RLbox could be infinity faster than Fil-C and I still wouldn’t care
So the question of performance is still relevant, even if RLBox's security properties are less tight.
They're just way too different. RLBox is a containerization technology. Fil-C is a memory safety technology.
Like, there's a world where you would use both of them stacked on top of one another, because Fil-C does memory safety without containerizing while RLBox does containerization without memory safety.
It's like comparing apples and oranges - which I always found to be a weird saying because of course it makes sense to compare apples and oranges.
Edit: looked it up and apparently it was originally "apples and oysters" which makes way more sense
https://english.stackexchange.com/a/132907/114887
Although even then it isn't incorrect to compare them. Perhaps you have to choose a canapé, we could have tiny slices of delicious apples, or succulent oysters. But it's impossible to chose because there's no way to compare them arrrghhh!
I wonder what the "most different" two things are.
Most of us sadly cargo cult urban myths and only believe in what is running in front of them like Saint Thomas, as to have any kind of feeling how great some things could be like.
Hence why so many UNIX and C clones, instead of creating something new, to be honest those two guys were also visionaries despite some of the flaws, back in 1970's.
>The fast path of a pollcheck is just a load-and-branch.
A neat technique I've seen used to avoid these branches is documented at https://android-developers.googleblog.com/2023/11/the-secret... under "Implicit suspend checks".
I’m very far from doing those kinds of low level optimizations because I have a large pile of very high level and very basic optimizations still left to do!
I am working on adding threads to Virgil (slowly, in the background, heh). I'll use the simple load+branch from the TLS for the simple reason that the GC code is also written in Virgil and it must be possible to turn off safepoints that have been inserted into the GC code itself, which is easy and robust to do if they are thread-local.
This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).
No it's not, not even close.
> This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).
Yeah, that part is hard, and maybe I'll describe it in another post.
Look for `filc_enter` and `filc_exit` in https://github.com/pizlonator/fil-c/blob/deluge/libpas/src/l...
Never heard of this one, looking forward to diving in this weekend.
The stack scan is really fast. There's not a lot of logic in there. If you max out the stack height limit (megabytes of stack?) then maybe that means milliseconds of work to scan that stack. That's still not bad.
Probably not. Your game would be inches of stack away from crashing
(It would be a linear scan if I was just conservatively scanning, but then I'd have other problems.)
This is one of the most counterintuitive parts of GC perf! You'd think that the stack scan had to be a bottleneck, and it might even be one in some corner cases. But it's just not the longest pole in the tent most of the time, because you're so unlikely to actually have a 1MB stack, and programs that do have a 1MB stack tend to also have ginormous heaps (like many gigabytes), which then means that even if the stack scan is a problem it's not the problem.
I don't know, maybe the fact that I'm disagreeing with someone who knows a lot more than I do about the issue should be a warning sign that I'm probably wrong?
(I’m not trying to BS my way here - I’m explaining the reason why on the fly GC optimization almost never involves doing stuff about the time it takes to scan stack. It’s just not worth it. If it was, we’d be seeing a lot of stacklet type optimizations.)
To be fair, FUGC doesn’t currently let you do that. The GC runs in a separate thread and soft handshakes at various points, which cause your game thread to react at poll checks and exits that might not be at end of tick.
But I could add a feature that lets you to force handshake responses to be at end of tick! That sounds like a good idea
Because if they get converted to integers and then stored to the heap then they lose their capability. So accesses to them will trap and the GC doesn’t need to care about them.
Also it’s not “Dijkstra accurate”. It’s a Dijkstra collector in the sense that it uses a Dijkstra barrier. And it’s an accurate collector. But these are orthogonal things
https://github.com/protocolbuffers/protobuf/blob/cb873c8987d...
// This somewhat silly looking add-and-subtract behavior provides provenance
// from the original input buffer's pointer. After optimization it produces
// the same assembly as just casting `(uintptr_t)ptr+input_delta`
// https://godbolt.org/z/zosG88oPn
size_t position =
(uintptr_t)ptr + e->input_delta - (uintptr_t)e->buffer_start;
return e->buffer_start + position;
It does use the implementation defined behavior that a char pointer + 1 casted to uintptr is the same as casting to uintptr then adding 1.Code that strives to preserve provenance works in Fil-C
Let's say I malloc(42) then print the address to stdout, and then do not otherwise do anything with the pointer. Ten minutes later I prompt the user for an integer, they type back the same address, and then I try to write 42 bytes to that address.
What happens?
Edit: ok I read up on GC literature briefly and I believe I understand the situation.
"conservative" means the garbage collector does not have access to language type system information and is just guessing that every pointer sized thing in the stack is probably a pointer.
"accurate" means the compiler tells the GC about pointer types, so it knows about all the pointers the type system knows about.
Neither of these are capable of correctly modeling the C language semantics, which allows ptrtoint / inttoptr. So if there are any tricks being used like xor linked lists, storing extra data inside unused pointer alignment bits, or a memory allocator implementation, these will be incompatible even with an "accurate" garbage collector such as this.
I should add, this is not a criticism, I'm just trying to understand the design space. It's a pretty compelling trade offer: give up ptrtoint, receive GC.
In particular, check out the sections called "Laundering Pointers As Integers" and "Laundering Integers As Pointers".
> This is because the capability is not stored at any addresses that are accessible to the Fil-C program.
How are they stored? Is the GC running in a different process?
The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.
I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU? They're making all the right noises about capability security and nonblocking synchronization and whatnot.
Does anyone have experience using it in practice? I see that https://news.ycombinator.com/item?id=45134852 reports a 4× slowdown or better.
The name is hilarious. Feelthay! Feelthay!
You could. That said, FUGC’s guts rely on OS features that in turn rely on an MMU.
But you could make a version of FUGC that has no such dependency.
As for perf - 4x is the worst case and that number is out there because I reported it. And I report worst case perf because that’s how obsessive I am about realistically measuring, and then fanatically resolving, perf issues
Fact is, I can live on the Fil-C versions of a lot of my favorite software and not tell the difference
What's the minimal code size overhead for FUGC?
I don’t have good data on this.
The FUGC requires the compiler to emit extra metadata and that metadata is hilariously inefficient right now. I haven’t bothered to optimize it. And the FUGC implementation pulls in all of libpas even though it almost certainly doesn’t have to.
So I don’t know what minimal looks like right now
I guess there would be no way to verify that precompiled user programs actually enforce the security boundaries. The only way to guarantee safety in such a system would be to compile everything from source yourself.
The incentives are strong to drive the runtime cost as close to zero as possible, which involves making your proof-checking system so expressive that it's hard to get right. The closer you get to zero, the more performance-sensitive your userbase gets. No part of your userbase is actively testing the parts of your verifier that reject programs; they try to avoid generating programs that get rejected, and as the verifier gets larger and larger, it requires more and more effort to generate code that exploits one of its bugs, although there are more and more of them. As the effort required to slip malicious code past the verifier grows, the secret in-house tools developed by attackers gives them a larger and larger advantage over the defenders.
Continued "maintenance" applied to the verifier drive its size and complexity up over time, driving the probability of a flaw inexorably toward 100%, while, if it is not "maintained" through continuous changes, it will break as its dependencies change, it will be considered outdated, and nobody will understand it well enough to fix a bug if it does surface.
We've seen this happen with Java, and although it's clearly not unavoidable in any kind of logical sense, it's a strong set of social pressures.
Dynamic checking seems much more memetically fit: developers will regularly write code that should fail the dynamic checks, and, if it passes instead, they will send you an annoyed bug report about how they had to spend their weekend debugging.
We've seen a lot of complexity happen with Java but that was generally in the standard library facilities that are bundled with the language and runtime, not really the basic JVM type checking pass. Proof carrying code is closer to the latter.
There have been numerous holes discovered in various implementations of the basic JVM type checking, often after existing for many years.
...of sufficient power, eg. that can model arithmetic with both addition and multiplication. I think the caveats are important because systems that can't fully model arithmetic are often still quite useful!
What would happen if you tried to do PCC for InvisiCaps and FUGC is that it would ultimately constrain what optimizations are possible, because the optimizer would only be able to pick from the set of tricks that it could express a proof for within whatever proof system we picked
Totally not the end of the world.
Do I think this an interesting thing to actually do? I’m not sure. It’s certainly not the most interesting thing to do with Fil-C right now
Maybe you're right, and those problems are not inevitable; for example, if you could reduce the proofs to a tiny MetaMath-like kernel that wouldn't need constant "maintenance". As you say, that could move the compiler's optimizer out of the TCB — at least for the security properties Fil-C is enforcing, though the optimizer could still cause code to compute the wrong answers.
That seems like something people would want if they were already living in a world where the state of computer security was much better than it is now.
Ibm I applications are compiled to a hardware-independent intermediate representation called TIMI, which the SLIC (kernel) can then compile down to machine code, usually at program installation time. As the SLIC is also responsible for maintaining system security, there's no way for a malicious user to sneak in a noncompliant program.
And unlike AS400, I don't think either Smalltalk or Lisp machines used the bytecode abstraction to achieve security.
I have found bugs in the software that I’ve ported, yeah.
I'm actually working on my own language that has a non-moving GC. It uses size classes (so 16 byte objects, 32 byte objects etc.), each of which is allocated in a continous slab of memory. Occupancy is determined by a bitmap, 1 bit for each slot in the slab.
The GC constructs a liveness bitmap for the size class, and the results are ANDed together, 'freeing' the memory. If you fill the slab with dead objects, then run the GC, it will not walk anywhere on this slab, create an all zero liveness bitmap, and free the memory.
The liveness bitmap approach is pretty widespread at this point; jemalloc works the same way IIRC.
Still, I think that counts as "visiting" in the context of this discussion: https://news.ycombinator.com/item?id=45137139
I agree that it's easy to add in a visitation pass, where you take the bitmap of live objects after marking and diff it with the currently existing one in order to signal that you might have a leak.
So basically, I think we're like 99% in agreement.
Typically bitmap-based allocators don't actually allow a 16-byte size class to have 32-byte objects in it, but I haven't looked at FUGC to see if that's true of it.
//assume free map is the bitmap where 1 means free
uint32_t free_map;
uint32_t free_map_2 = (free_map & (free_map >> 1)); // so on and so forth
I haven't really done anything like this yet, it has certain disadvantages, but you can pack multiple size classes into the same bitmap, you do a bit more work during alloc and resolving interior pointers is a bit more costly (if you have those), in exchange for having less size classes.
t &= t << 1;
t &= t << 2;
t &= t << 2;
and that sort of thing is pretty appealing, but you lose the ability to know what size an object is just by looking at an address, and it's still a lot slower than scanning for an open slot in a page of 5× bigger objects.Should I assume from your use of uint32_t that you're targeting embedded ARM microcontrollers?
A bunch of other optimizations fall out from doing that
Well, when using a bitmap (as they seem to do in the article), then multiple subsequent dead objects are considered to be in the same range, because multiple subsequent bits in the bitmap have the value zero. There is no need to visit each zero bit in the bitmap separately.
But the malloc library could be fully integrated into the GC mechanism. In this case, there is no need. That's probably much easier, and faster, because the malloc can be simplified to bumping a pointer.
I do find the statement dubious still, do you mind clearing it up for me?
Given a page { void* addr; size_t size; size_t alignment; BitMap used; } where used's size in bits is page.size / page.alignment, surely we only need to visit the used bitmap for marking a memory slot as free?
FUGC used a bit vector SIMD sweep using a bit vector on the side so it doesn’t visit the dead objects at all in the sense that it doesn’t touch their contents. And it only visits them in the sense that a single instruction deals with many dead or alive objects at once.
How much more memory do GC programs tend to use?
Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?
You mention 'accurate' by which I assume you use the compiler to keep track of where the pointers are (via types/stackmaps).
How do you deal with pointers that get cast to ints, and then back?
Avoid pointer chasing. Use SIMD.
> How much more memory do GC programs tend to use?
I would estimate 2x
Fil-C has additional overheads not related to GC, so maybe it’s higher. I haven’t started measuring and optimizing memory use in anger.
> Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?
I love the concept of Fil-C but I find that with the latest release, a Fil-C build of QuickJS executes bytecode around 30x slower than a regular build. Admittedly this is an informal benchmark running on a GitHub CI runner. I’m not sure if virtualization introduces overheads that Fil-C might be particularly sensitive to (?). But I’ve sadly yet to see anything close to a 4x performance difference. Perhaps I will try running the same benchmark on native non-virtualized x86 later today.
Also, so I am not just whining, my Fil-C patch to the QuickJS main branch contains a fix for an issue that’s only triggered by regex backtracking, and which I think you might have missed in your own QuickJS patch:
http://github.com/addrummond/jsockd/blob/main/fil-c-quickjs....
I know that I regressed quickjs recently when I fixed handling of unions. It’s a fixable issue, I just haven’t gone back and fixed it yet.
I definitely don’t see 30x overhead on anything else I run.
But thanks for pointing that out, I should probably actually fix the union handling the right way.
(What’s happening is every time quickjs bit casts doubles to pointers, that’s currently doing a heap allocation. And it’s obviously not needed. The simplest compiler analysis would kill it. I just turned off the previous instance of that analysis because it had a soundness issue)
Even if it worked for normal data flow, that's the sort of thing that's bound to introduce covert channels, I'd have thought. To start with I guess you have immediately disabled the mitigations of meltdown/spectre, because doesn't that happen when you switch processes?
But consider, for example, decoding JPEG, or maybe some future successor to JPEG, JEEG, by the Joint Evil Experts Group. You want to look at a ransom note that someone in the JEEG has sent you in JEEG format so that you know how much Bitcoin to send them. You have a JEEG decoder, but it was written by Evil Experts, so it might have vulnerabilities, as JPEG implementations have in the past, and maybe the ransom note JEEG is designed to overflow a buffer in it and install a rootkit. Maybe the decoder itself is full of malicious code just waiting for the signal to strike!
If you can run the JEEG decoder in a container that keeps it from accessing the network, writing to the filesystem, launching processes, executing forever, allocating all your RAM, etc., only being permitted to output an uncompressed image, even if you let it read the clock, it probably doesn't matter if it launches some kind of side-channel attack against your Bitcoin wallet and your Bitchat client, because all it can do is put the information it stole into the image you are going to look at and then discard.
You can contrive situations where it can still trick you into leaking bits it stole back to the JEEG (maybe the least significant bits of the ransom amount) but it's an enormous improvement over the usual situation.
Then, FalseType fonts...
CPython in Rust https://github.com/RustPython/RustPython
Bash in Rust https://github.com/shellgei/rusty_bash
> Warning: This software is ALPHA, only use for development, testing, and experimentation. We are working to make it production ready, but do not use it for critical data right now.
https://rustpython.github.io/pages/whats-left says:
> RustPython currently supports the full Python syntax. This is “what’s left” from the Python Standard Library.
Rusty_bash says:
> Currently, the binary built from alpha repo has passed 24 of 84 test scripts.
The CPython implementation is farther along than I had expected! I hope they make more progress.
Especially for the Turso project if you look under "Insights -> Contributors" on their Github page, then it's clear that that project is under heavy active development, and they have an actual funded business startup that want's to sell access to a cloud version of Turso, so they are definitely incentivized to complete it.
Sqlite was built by three people, and has a stable and well defined interface and file format. This seems like an actual tractable project to re-implement if you have enough man years of funding and a talented enough dev team. Turso seems like they could fit the bill.
In theory it could happen, and the Python project seems to be much closer than I had imagined was possible at this point. But it's not likely to.
Certainly it's plausible that in the next few years it'll be pretty damn easy, but with the rapid and unpredictable development of AI, it's also plausible that humanity will be extinct or that all current programming languages will be abandoned.
I have actually been on a similar workflow with frontier models for a while, but I have an oracle tool that agents can use which basically bundles up a covering set for whatever the agent is working on and feeds it to gemini/gpt5, and generates a highly detailed plan for the agent to follow. It helps a ton with agents derpily exploring code bases, accidentally duplicating functionality, etc.
I use claude code mostly because the economics of the plan are unbeatable if you can queue a lot of work efficiently. Most of the agents are pretty good now if you use sonnet, the new deepseek is pretty good too. I have my own agent written in rust which is kicking the pants off stuff in early tests, once I get an adapter to let me use it with the claude code plan in place I'm going to switch over. Once my agent stabilizes and I have a chance to do some optimization runs on my tool/system prompts the plan is to crush SWEbench :P
Then we’ll have a continuation of the memory safety exploit dumpster fire because these Rust ports tend to use a significant amount of unsafe code.
On the other hand, Fil-C has no unsafe escape hatches.
Think of Fil-C as the more secure but slower/heavier alternative to Rust
Architecture of embedded solutions is different than desktop and server. Example, to prevent memory from fragmenting and high performance, do not free it. Mark that memory (object / struct) as reusable. It is similar to customized heap allocation or pooling.
[0] https://sqlite.org/vfs.html [1] https://en.wikipedia.org/wiki/Micro-Controller_Operating_Sys...
How many years away are we from having AI-enhanced static analysis tools that can accurately look at our C code (after the fact or while we're writing it) and say "this will cause problems, here's a fix" with a level of accuracy sufficient that we can just continue using C?
Interestingly, I agree with your point in general that there's a lot of software that Fil-C might be a good fit for, but I hesitate to say that about any of the examples you listed:
* CPython and Perl5 are the runtimes for notoriously slow GCed languages, and adding the overhead of a second GC seems...inelegant at best, and likely to slow things down a fair bit more.
* Some of them do have reimplementations or viable alternatives in Rust (or Go or the like) underway, like Turso for SQLite.
* More generally, I'd call these foundational, widely-used, actively maintained pieces of software, so it seems plausible to me that they will decide to RiiR.
I think the best fit may be for stuff that's less actively maintained and less performance-critical. There's 50 years of C programs that people still dig out of the attic sometime but aren't putting that much investment into and are running on hardware vastly more powerful than these programs were written for.
https://www.theregister.com/2024/11/16/rusthaters_unite_filc...
Good stuff!
How hard would it be to support Windows?
If I thought it was ambiguous enough to really try to fix, what I would probably default to is an elaboration. For example: "I love the intersection of C, performance and security." The meaning is not exactly the same, but it is more the same than your colon.
As you said, it can be rare to see a case where it truly is ambiguous, but the context here negates that well enough.
https://www.podscan.fm/podcasts/corepy/episodes/episode-21-a...
I found it very interesting.
And because of that it always involves some sort of hidden runtime cost which might bite you eventually and makes it unusable for many tasks.
I'd rather have my resource management verified at compile time and with no runtime overhead. That this is possible is proven by multiple languages now.
That being said, I can imagine some C programs for which using Fil-C is an acceptable trade-off because they just won't be rewritten in language that is safer anytime soon.
There are problem domains where tracing garbage collection simply cannot be avoided. This is essentially always the case when working with problems that involve constructing arbitrary spaghetti-like reference graphs, possibly with cycles. There's no alternative in that case to "making a mess" and dealing with it as you go, because that requirement is inherent in the problem itself.
It would be interesting to have a version of Fil-C that could work with a memory-safe language like Rust to allow both "safe" leaf references (potentially using ownership and reference counting to represent more complex allocated objects, but would not themselves "own" pointers to Fil-C-managed data, thus avoiding the need to trace inside these objects and auto-free them) and general Fil-C managed pointers with possible cycles (perhaps restricted to some arena/custom address space, to make tracing and collecting more efficient). Due to memory safety, the use of "leaf" references could be ensured not to alter the invariants that Fil-C relies on; but managed pointers would nonetheless be available whenever GC could not be avoided.
Malloc and free have runtime overhead — sometimes more than the overhead of garbage collection.
The only way to have no overhead is to statically allocate fixed sized buffers for everything.
When I write in Rust, the process uses very little RAM. BUT, I often spend a lot of time working through ownership issues and other syntax sugar to prove that memory is cleaned up correctly.
When I write in garbage collected languages, I can move a lot faster. Yes, the process uses more RAM, but I can finish a lot more quickly. Depending on the program that I'm writing, finishing quickly may be more important than using as little RAM as possible.
Furthermore, "which is better" isn't always clear. If you're relying on reference counting (smart pointers; or ARC or RC in Rust), you could actually spend more CPU cycles maintaining the count than an optimized garbage collector will spend finding free memory.
(IE, you spend a lot of time working in a RAM efficient language only to end up with a program that trades off RAM efficiency for CPU efficiency. Or even worse, you might miss your window for building a prototype or testing a feature because you became obsessed with a metric that just doesn't matter.)
These are very critical tradeoffs to understand when you make statements like "Garbage collection is and always was an evolutionary dead end," "it feels wrong to make a mess and have some else clean it up inefficiently at some point later," and "hidden runtime cost".
(Remember, sometimes maintaining a reference count uses more CPU than an optimized garbage collector.)
> it feels wrong to make a mess and have some else clean it up inefficiently at some point later.
Yet no one argues for manual register allocation anymore and will gleefully use dozens or even hundreds of locals and then thousands of functions, just trusting the compiler to sort it all out.
We make progress by making the machines implement our nice abstractions.
An evolutionary dead end that is used by probably upwards of 90% of all productively running code, and is the subject of lots of active research, as evidenced by TFA?
> No matter how nice you make it, it feels wrong to make a mess and have some else clean it up inefficiently at some point later.
> And because of that it always involves some sort of hidden runtime cost
Because of what? Your feelings?
1. https://twitter.com/6thgrade4ever/status/1433519577892327424
@pizlonator ?