Skip Navigation
InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)AP
armchair_progamer @programming.dev
Posts 239
Comments 49

“C has known memory footguns. C++, due to its complexity, has footnukes.” - Hacker News comment

11

graydon/rust-prehistory: historical archive of rust pre-publication development

github.com GitHub - graydon/rust-prehistory: historical archive of rust pre-publication development

historical archive of rust pre-publication development - graydon/rust-prehistory

GitHub - graydon/rust-prehistory: historical archive of rust pre-publication development

Excerpt:

> This is a reconstruction -- extracted and very lightly edited -- of the "prehistory" of Rust development, when it was a personal project between 2006-2009 and, after late 2009, a Mozilla project conducted in private. > > The purposes of publishing this now are: > > - It might encourage someone with a hobby project to persevere > - It might be of interest to language or CS researchers someday > - It might settle some arguments / disputed historical claims

Rust started being developed 18 years ago. This is how it looked until 14 years ago, where I believe the rest of development is on rust-lang/rust. The first Rust program looks completely different than the Rust we know today.

1

Firedancer: DSL for defining bullet-hell patterns (demo)

firedancer-lang.com Firedancer

Haxe-based language for defining 2D shmups bullet-hell patterns.

Firedancer

GitHub

> Haxe-based language for defining 2D shmups bullet-hell patterns.

The VM also runs in Haxe.

1

Soundly Handling Linearity (blog post + paper)

> Soundly handling linearity requires special care in the presence of effect handlers, as the programmer may inadvertently compromise the integrity of a linear resource. For instance, duplicating a continuation that closes over a resource can lead to the internal state of the resource being corrupted or discarding the continuation can lead to resource leakage. Thus a naïve combination of linear resources and effect handlers yields an unsound system.

...

> In the remainder of this blog post we describe a novel approach to rule out such soundness bugs by tracking control-flow linearity, a means to statically assure how often a continuation may be invoked which mediates between linear resources and effectful operations in order to ensure that effect handlers cannot violate linearity constraints on resources. We focus on our implementation in Links. The full technical details are available in our open access POPL'24 distinguished paper Soundly Handling Linearity.

0

Syndicated Actors

> Programming models like the Actor model and the Tuplespace model make great strides toward simplifying programs that communicate. However, a few key difficulties remain. > > The Syndicated Actor model addresses these difficulties. It is closely related to both Actors and Tuplespaces, but builds on a different underlying primitive: eventually-consistent replication of state among actors. Its design also draws on widely deployed but informal ideas like publish/subscribe messaging.

For reference, actors and tuple-spaces are means to implement concurrent programs. Actors are essentially tiny programs/processes that send (push) messages to each other, while a tuple-space is a shared repository of data ("tuples") that can be accessed (pulled) by different processes (e.g. actors).

...

> A handful of Domain-Specific Language (DSL) constructs, together dubbed Syndicate, expose the primitives of the Syndicated Actor model, the features of dataspaces, and the concepts of conversational concurrency to the programmer in an ergonomic way.

...

> To give some of the flavour of working with Syndicate DSL constructs, here's a program written in JavaScript extended with Syndicate constructs: > > javascript > function chat(initialNickname, sharedDataspace, stdin) { > spawn 'chat-client' { > field nickName = initialNickname; > > at sharedDataspace assert Present(this.nickname); > during sharedDataspace asserted Present($who) { > on start console.log(`${who} arrived`); > on stop console.log(`${who} left`); > on sharedDataspace message Says(who, $what) { > console.log(`${who}: ${what}`); > } > } > > on stdin message Line($text) { > if (text.startsWith('/nick ')) { > this.nickname = text.slice(6); > } else { > send sharedDataspace message Says(this.nickname, text); > } > } > } > } >

Documentation

Comparison with other programming models

History

Author's thesis

0

You should make a new programming language

Even though it's very unlikely to become popular (and if so, it will probably take a while), there's a lot you learn from creating a programming language that applies to other areas of software development. Plus, it's fun!

10

First impressions of Gleam: lots of joys and some rough edges

The blog post is the author's impressions of Gleam after it released version 1.4.0. Gleam is an upcoming language that is getting a lot of highly-ranked articles.

It runs on the Erlang virtual machine (BEAM), making it great for distributed programs and a competitor to Elixir and Erlang (the language). It also compiles to JavaScript, making it a competitor to TypeScript.

But unlike Elixir, Erlang, and TypeScript, it's strongly typed (not just gradually typed). It has "functional" concepts like algebraic data types, immutable values, and first-class functions. The syntax is modeled after Rust and its tutorial is modeled after Go's. Lastly, it has a very large community.

2

Zyme: an evolvable language for genetic programming

zyme.dev Zyme - an evolvable language

an esoteric language for genetic programming.

> Zyme is an esoteric language for genetic programming: creating computer programs by means of natural selection. > > For successful evolution mutations must generate a wide range of phenotypic variation, a feat nearly impossible when randomly modifying the code of conventional programming languages. Zyme is designed to maximize the likelihood of a bytecode mutation creating a novel yet non-fatal change in program behavior. > > Diverging from conventional register or stack-based architectures, Zyme uses a unique molecular automaton-based virtual machine, mimicking an abstract cellular metabolism. This design enables fuzzy control flow and precludes invalid runtime states, transforming potential crashes into opportunities for adaptation.

Very unique, even for an esoteric language. Imagine a program that gets put through natural selection and "evolves" like a species: the program is cloned many times, each clone is slightly mutated, the clones that don't perform as well on some metric are discarded, and the process is repeated; until eventually you have programs that do great on the metric, that you didn't write.

0
www.cs.cornell.edu Geometry Bugs and Geometry Types

A special kind of bug exists in code that has to deal with geometric concepts like positions, directions, coordinate systems, and all that. Long ago, in an OOPSLA 2020 paper, we defined geometry bugs and designed a type system to catch them. This post demonstrates the idea through some buggy GLSL sh...

Geometry Bugs and Geometry Types

Key excerpt:

> At OOPSLA 2020, Prof. Dietrich Geisler published a paper about geometry bugs and a type system that can catch them. The idea hasn't exactly taken over the world, and I wish it would. The paper's core insight is that, to do a good job with this kind of type system, you need your types to encode three pieces of information: > > - the reference frame (like model, world, or view space) > - the coordinate scheme (like Cartesian, homogeneous, or polar coordinates) > - the geometric object (like positions and directions) > > In Dietrich's language, these types are spelled scheme<frame>.object. Dietrich implemented these types in a language called Gator with help from Irene Yoon, Aditi Kabra, Horace He, and Yinnon Sanders. With a few helper functions, you can get Gator to help you catch all the geometric pitfalls we saw in this post.

0

Interval Parsing Grammars for File Format Parsing (paper)

dl.acm.org Interval Parsing Grammars for File Format Parsing | Proceedings of the ACM on Programming Languages

File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We ...

Interval Parsing Grammars for File Format Parsing | Proceedings of the ACM on Programming Languages

Abstract:

> File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.

0

The Hitchhiker’s Guide to Logical Verification (book)

Associated class (Brown University cs1951x)

This is a 209-page book on logical verification in Lean (4.0, the "new" version), available as a PDF.

1
www.cs.cornell.edu Bril: An Intermediate Language for Teaching Compilers

I created a new intermediate language, called Bril, for teaching my funky open-source, hands-on compilers course. Because it’s for education, Bril prioritizes simplicity and regularity over more typical compiler priorities like performance and concision. This is an overview of Bril’s design, its qui...

Bril: An Intermediate Language for Teaching Compilers

> I created Bril, the Big Red Intermediate Language, to support the class's implementation projects. Bril isn't very interesting from a compiler engineering perspective, but I think it's pretty good for the specific use case of teaching compilers classes. Here's a factorial program: > > ruby > @main(input: int) { > res: int = call @fact input; > print res; > } > > @fact(n: int): int { > one: int = const 1; > cond: bool = le n one; > br cond .then .else; > .then: > ret one; > .else: > decr: int = sub n one; > rec: int = call @fact decr; > prod: int = mul n rec; > ret prod; > } > > > > Bril is the only compiler IL I know of that is specifically designed for education. Focusing on teaching means that Bril prioritizes these goals: > > - It is fast to get started working with the IL. > - It is easy to mix and match components that work with the IL, including things that fellow students write. > - The semantics are simple, without too many distractions. > - The syntax is ruthlessly regular. > > Bril is different from other ILs because it prioritizes those goals above other, more typical ones: code size, compiler speed, and performance of the generated code. > > Aside from that inversion of priorities, Bril looks a lot like any other modern compiler IL. It's an instruction-based, assembly-like, typed, ANF language. There's a quote from why the lucky stiff where he introduces Camping, the original web microframework, as "a little white blood cell in the vein of Rails." If LLVM is an entire circulatory system, Bril is a single blood cell.

Reference

GitHub

0

GLisp: Graphical Lisp (interactive)

glisp.app (glisp)

A Lisp-based Design Tool Bridging Graphic Design and Computational Arts

(glisp)

GitHub

> Glisp is a Lisp-based design tool that combines generative approaches with traditional design methods, empowering artists to discover new forms of expression.

> Glisp literally uses a customized dialect of Lisp as a project file. As the Code as Data concept of Lisp, the project file itself is the program to generate an output at the same time as a tree structure representing SVG-like list of shapes. And even the large part of the app's built-in features are implemented by the identical syntax to project files. By this nature so-called homoiconicity, artists can dramatically hack the app and transform it into any tool which can be specialized in various realms of graphics -- daily graphic design, illustration, generative art, drawing flow-chart, or whatever they want. I call such a design concept "purpose-agnostic". Compared to the most of existing design tools that are strictly optimized for a concrete genre of graphics such as printing or UI of smartphone apps, I believe the attitude that developers intentionally keep being agnostic on how a tool should be used by designers makes it further powerful.

0
bernsteinbear.com Abstract interpretation in the Toy Optimizer

CF Bolz-Tereick wrote some excellent posts in which they introduce a small IR and optimizer and extend it with allocation removal. We also did a live stream together in which we did some more heap optimizations.

Intro:

> CF Bolz-Tereick wrote some excellent posts in which they introduce a small IR and optimizer and extend it with allocation removal. We also did a live stream together in which we did some more heap optimizations. > > In this blog post, I'm going to write a small abtract interpreter for the Toy IR and then show how we can use it to do some simple optimizations. It assumes that you are familiar with the little IR, which I have reproduced unchanged in a GitHub Gist. > > Abstract interpretation is a general framework for efficiently computing properties that must be true for all possible executions of a program. It's a widely used approach both in compiler optimizations as well as offline static analysis for finding bugs. I'm writing this post to pave the way for CF's next post on proving abstract interpreters correct for range analysis and known bits analysis inside PyPy.

Abstract Interpretation in a Nutshell

0

This blog post explains why algebraic data types are "algebraic" - how every algebraic data type corresponds to a mathematical equation - and describes some ways to use a type's corresponding equation to reason about the type itself.

0

Modal Effect Types (paper)

Abstract:

> We propose a novel type system for effects and handlers using modal types. Conventional effect systems attach effects to function types, which can lead to verbose effect-polymorphic types, especially for higher-order functions. Our modal effect system provides succinct types for higher-order first-class functions without losing modularity and reusability. The core idea is to decouple effects from function types and instead to track effects through relative and absolute modalities, which represent transformations on the ambient effects provided by the context. > > We formalise the idea of modal effect types in a multimodal System F-style core calculus Met with effects and handlers. Met supports modular effectful programming via modalities without relying on effect variables. We encode a practical fragment of a conventional row-based effect system with effect polymorphism, which captures most common use-cases, into Met in order to formally demonstrate the expressive power of modal effect types. To recover the full power of conventional effect systems beyond this fragment, we seamlessly extend Met to Mete with effect variables. We propose a surface language Metel for Mete with a sound and complete type inference algorithm inspired by FreezeML.

Modal logic and modal types have also been used to implement Rust-like "ownership", staged metaprogramming, distributed programming, and more.

0

IELR(1) parser generator insights

IELR(1) a niche LR(1) parser generator. More well-known are LALR and Pager's "minimal" LR(1) algorithm (PGM(1)), but IELR(1) can generate a parser for certain grammars that those cannot. This post by the same authors goes into more detail about the problem IELR(1) solves.

The linked post is about implementing IELR(1), in particular the challenges the authors faced doing so.

They've implemented IERL(1) in their own parser generator, hocc, that they're writing for their own language, Hemlock: "a systems programming language that emphasizes reliable high performance parallel computation" that is (or at least very similar to) an ML dialect.

1

Bio: A Lisp dialect written in Zig

github.com GitHub - cryptocode/bio: A Lisp dialect written in Zig

A Lisp dialect written in Zig. Contribute to cryptocode/bio development by creating an account on GitHub.

GitHub - cryptocode/bio: A Lisp dialect written in Zig

> Bio is an experimental Lisp dialect similar to Scheme, with an interpreter written in Zig > > Features include macros, garbage collection, error handling, a module facility, destructuring, and a standard library. > > Example: > > lisp > (filter > (quicksort '(5 40 1 -3 2) <) > (λ (x) (>= x 0))) > > (1 2 5 40) >

0
blog.sbensu.com We need visual programming. No, not like that.

Why do we keep building visual programming environments? Why do we never use them? What should we do instead?

> Most visual programming environments fail to get any usage. Why? They try to replace code syntax and business logic but developers never try to visualize that. Instead, developers visualize state transitions, memory layouts, or network requests. > > In my opinion, those working on visual programming would be more likely to succeed if they started with aspects of software that developers already visualize.

...

> Developers say they want "visual programming", which makes you think "oh, let's replace if and for". But nobody ever made a flow chart to read for (i in 0..10) if even?(i) print(i). Developers familiar with code already like and understand textual representations to read and write business logic2. > > Let's observe what developers do, not what they say. > > Developers do spend the time to visualize aspects of their code but rarely the logic itself. They visualize other aspects of their software that are important, implicit, and hard to understand. > > Here are some visualizations that I encounter often in serious contexts of use: > > - Various ways to visualize the codebase overall. > - Diagrams that show how computers are connected in a network > - Diagrams that show how data is laid out in memory > - Transition diagrams for state machines. > - Swimlane diagrams for request / response protocols. > > This is the visual programming developers are asking for. Developers need help with those problems and they resort to visuals to tackle them.

8

List of Rust static and dynamic analysis tools organized by type, with:

  • Name
  • Description
  • IR they analyze (HIR, MIR, LLVM IR, etc.)
  • Bug Types
  • Technology
  • Maintenance (1-5 stars, whether they're frequently updated or dead)
2
A real chicken-and-egg situation
  • C++’s mascot is an obese sick rat with a missing foot*, because it has 1000+ line compiler errors (the stress makes you overeat and damages your immune system) and footguns.

    EDIT: Source (I didn't make up the C++ part)

  • Methods Should Be Object Safe
  • I could understand method = associated function whose first parameter is named self, so it can be called like self.foo(…). This would mean functions like Vec::new aren’t methods. But the author’s requirement also excludes functions that take generic arguments like Extend::extend.

    However, even the above definition gives old terminology new meaning. In traditionally OOP languages, all functions in a class are considered methods, those only callable from an instance are “instance methods”, while the others are “static methods”. So translating OOP terminology into Rust, all associated functions are still considered methods, and those with/without method call syntax are instance/static methods.

    Unfortunately I think that some people misuse “method” to only refer to “instance method”, even in the OOP languages, so to be 100% unambiguous the terms have to be:

    • Associated function: function in an impl block.
    • Static method: associated function whose first argument isn’t self (even if it takes Self under a different name, like Box::leak).
    • Instance method: associated function whose first argument is self, so it can be called like self.foo(…).
    • Object-safe method: a method callable from a trait object.
  • Writing an EBNF -> Scheme parser generator using AI for machine-unfriendly descriptions, and further enhancements?
  • I find writing the parser by hand (recursive descent) to be easiest. Sometimes I use a lexer generator, or if there isn’t a good one (idk for Scheme), write the lexer by hand as well. Define a few helper functions and macros to remove most of the boilerplate (you really benefit from Scheme here), and you almost end up writing the rules out directly.

    Yes, you need to manually implement choice and figure out what/when to lookahead. Yes, the final parser won’t be as readable as a BNF specification. But I find writing a hand-rolled parser generator for most grammars, even with the manual choice and lookahead, is surprisingly easy and fast.

    The problem with parser generators is that, when they work they work well, but when they don’t work (fail to generate, the generated parser tries to parse the wrong node, the generated parser is very inefficient) it can be really hard to figure out why. A hand-rolled parser is much easier to debug, so when your grammar inevitably has problems, it ends up taking less time in total to go from spec to working hand-rolled vs. spec to working parser-generator-generated.

    The hand-rolled rules may look something like (with user-defined macros and functions define-parse, parse, peek, next, and some simple rules like con-id and name-id as individual tokens):

    ; pattern			::= [ con-id ] factor "begin" expr-list "end"
    (define-parse pattern
      (mk-pattern
        (if (con-id? (peek)) (next))
        (parse factor)
        (do (parse “begin”) (parse expr-list) (parse “end”))))
    
    ; factor 			::= name-id 
    ; 			 | symbol-literal
    ; 			 | long-name-id 
    ; 			 | numeric-literal 
    ;	 		 | text-literal 
    ; 			 | list-literal 
    ; 			 | function-lambda 
    ; 			 | tacit-arg
    ; 			 | '(' expr ')' 
    (define-parse factor
      (case (peek)
        [name-id? (if (= “.” (peek2)) (mk-long-name-id …) (next))]
        [literal? (next)]
        …))
    

    Since you’re using Scheme, you can almost certainly optimize the above to reduce even more boilerplate.

    Regarding LLMs: if you start to write the parser with the EBNF comments above each rule like above, you can paste the EBNF in and Copilot will generate rules for you. Alternatively, you can feed a couple EBNF/code examples to ChatGPT and ask it to generate more.

    In both cases the AI will probably make mistakes on tricky cases, but that’s practically inevitable. An LLM implemented in an error-proof code synthesis engine would be a breakthrough; and while there are techniques like fine-tuning, I suspect they wouldn’t improve the accuracy much, and certainly would amount to more effort in total (in fact most LLM “applications” are just a custom prompt on plain ChatGPT or another baseline model).

  • Programming Language Scalability (blog)
  • My general take:

    A codebase with scalable architecture is one that stays malleable even when it grows large and the people working on it change. At least relative to a codebase without scalable architecture, which devolves into "spaghetti code", where nobody knows what the code does or where the behaviors are implemented, and small changes break seemingly-unrelated things.

    Programming language isn't the sole determinant of a codebase's scalability, especially if the language has all the general-purpose features one would expect nowadays (e.g. Java, C++, Haskell, Rust, TypeScript). But it's a major contributor. A "non-scalable" language makes spaghetti design decisions too easy and scalable design decisions overly convoluted and verbose. A scalable language does the opposite, nudging developers towards building scalable software automatically, at least relative to a "non-scalable" language and when the software already has a scalable foundation.

  • I'm betting on Call-by-Push-Value
  • I believe the answer is yes, except that we’re talking about languages with currying, and those can’t represent a zero argument function without the “computation” kind (remember: all functions are Arg -> Ret, and a multi-argument function is just Arg1 -> (Arg2 -> Ret)). In the linked article, all functions are in fact “computations” (the two variants of CompType are Thunk ValType and Fun ValType CompType). The author also describes computations as “a way to add side-effects to values”, and the equivalent in an imperative language to “a value which produces side-effects when read” is either a zero-argument function (getXYZ()), or a “getter” which is just syntax sugar for a zero-argument function.

    The other reason may be that it’s easier in an IR to represent computations as intrinsic types vs. zero-argument closures. Except if all functions are computations, then your “computation” type is already your closure type. So the difference is again only if you’re writing an IR for a language with currying: without CBPV you could just represent closures as things that take one argument, but CBPV permits zero-argument closures.

  • Using Go as a compiler backend?
  • Go as a backend language isn’t super unusual, there’s at least one other project (https://borgo-lang.github.io) which chosen it. And there are many languages which compile to JavaScript or C, but Go strikes a balance between being faster than JavaScript but having memory management vs. C.

    I don’t think panics revealing the Go backend are much of an issue, because true “panics” that aren’t handled by the language itself are always bad. If you compile to LLVM, you must implement your own debug symbols to get nice-looking stack traces and line-by-line debugging like C and Rust, otherwise debugging is impossible and crashes show you raw assembly. Even in Java or JavaScript, core dumps are hard to debug, ugly, and leak internal details; the reason these languages have nice exceptions, is because they implement exceptions and detect errors on their own before they become “panics”, so that when a program crashes in java (like tries to dereference null) it doesn’t crash the JVM. Golang’s backtrace will probably be much nicer than the default of C or LLVM, and you may be able to implement a system like Java which catches most errors and gives your own stacktrace beforehand.

    Elm’s kernel controversy is also something completely different. The problem with Elm is that the language maintainers explicitly prevented people from writing FFI to/from JavaScript except in the maintainers’ own packages, after allowing this feature for a while, so many old packages broke and were unfixable. And there were more issues: the language itself was very limited (meaning JS FFI was essential) and the maintainers’ responses were concerning (see “Why I’m leaving Elm”). Even Rust has features that are only accessible to the standard library and compiler (“nightly”), but they have a mechanism to let you use them if you really want, and none of them are essential like Elm-to-JS FFI, so most people don’t care. Basically, as long as you don’t become very popular and make a massively inconvenient, backwards-incompatible change for purely design reasons, you won’t have this issue: it’s not even “you have to implement Go FFI”, not even “if you do implement Go FFI, don’t restrict it to your own code”, it’s “don’t implement Go FFI and allow it everywhere, become very popular, then suddenly restrict it to your own code with no decent alternatives”.

  • Unification-free ("keyword") type checking
  • You probably need to annotate recursive function return values. I know in some languages like Swift and Kotlin, at least in some cases this is required; and other languages have you annotate every function’s return value (or at least, every function which isn’t a lambda expression). So IMO it’s not even as much of a drawback as the other two.