Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZON #20271

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

ZON #20271

wants to merge 21 commits into from

Conversation

MasonRemaley
Copy link
Contributor

@MasonRemaley MasonRemaley commented Jun 12, 2024

ZON

This PR implements ZON, or Zig Object Notation (think JSON, but with Zig syntax instead of Javascript.)

In particular, it implements:

  • Runtime parsing of ZON data
  • Runtime stringification of ZON data
  • Compile time importing of ZON data

A lot of files are added since I added a lot of test cases, the bulk of the work is in src/zon.zig and lib/std/zon. If there's anything I can do to make this easier to review, let me know.

Tests cover all new features.

Runtime

The runtime code can be found at lib/std/zon.

Parsing

lib/std/zon/parse.zig

Detailed doc comments are provided in the source. At a high level, this module provides the following functions and some configuration options:

  • parseFromSlice
  • parseFromAst
  • parseFromAstNoAlloc
  • parseFromAstNode
  • parseFromAstNodeNoAlloc
  • parseFree

Most people will just want parseFromSlice and maybe parseFree, but the more fine grained functions are available for those who need them.

Stringify

lib/std/zon/stringify.zig

Detailed doc comments are provided in the source. At a high level, this module provides the following functions and some configuration options:

  • stringify
  • stringifyMaxDepth
  • stringifyArbitraryDepth
  • stringifier

Most users will just need stringify, or one of its variants.

However, stringifier returns a much more fine grained interface that is used in the implementation of the other functions. It may be used directly if you need to serialize something piece by piece--perhaps to apply different configuration to different parts of the value, or because the value does not actually exist laid out in memory in the same form in which you want to stringify it.

Compile Time

Almost all of the logic for this is in src/zon.zig.

This PR also implements importing ZON at compile time:

const Foo = @import("foo.zon");

Things that may change later...

& In ZON syntax

Right now, ZON slices need to be prefixed with & to distinguish them from tuples. This is necessary because we don't know the result type when importing. In the future I think ZON should not distinguish between tuples and slices, it should just coerce to whatever the destination type is.

Update: & is not allowed in ZON, ZON will automatically coerce tuples to arrays or slices when given a destination type.

Untagged nonexhaustive enum values not yet supported

Zig does not currently have a way to represent an untagged enum value as a literal, and as such there's no way to represent one in ZON.

We could resolve this by adding a syntax for untagged enum literals to Zig, or by allowing ZON to coerce integers to untagged enum literals (provided the destination type is known.) Personally I prefer the former solution.

  • After this PR is merged I'll file an issue to track this if still relevant.

Type mismatches do not always point at helpful location

This is an existing issue, but it can make type errors in ZON imports annoying.

Consider this incorrect code:

    const Struct = struct {
        outer: struct { inner: u8 },
    };
    const some_value = .{ .outer = .{ .inner = 1.5 } };
    const casted: Struct = some_value;

It produces the following error:

src/main.zig:22:28: error: fractional component prevents float value '1.5' from coercion to type 'u8'
    const casted: Struct = some_value;

This isn't super helpful. The location information should be pointed to the struct literal, or a note should be added that points to the struct literal.

This comes up all the time with ZON, e.g. any time you do something like this and make a mistake:

    const Struct = struct {
        outer: struct { inner: u8 },
    };
    const casted: Struct = @import("foo.zon");
  • After this PR is merged I'll file an issue to track this.

Allowing eliding outer braces may make sense

There's case to be made to allow eliding braces on an outer struct in ZON, like this:

.x = 10,
.y = 20,
  • I'll file an issue with some motivating use cases after this is merged, easy change if we decide to make it

Divergences from Zig

Zig has no NaN or inf literals right now. Zon does, since otherwise it would have no way to represent these values.

IMO we should add the same literals to Zig.

  • After this PR is merged I'll file an issue to track this.

See Also

@MasonRemaley MasonRemaley mentioned this pull request Jun 12, 2024
src/Module.zig Outdated Show resolved Hide resolved
@@ -88,6 +88,11 @@ pub const Case = struct {
expect_exact: bool = false,
backend: Backend = .stage2,
link_libc: bool = false,
/// A list of imports to cache alongside the source file.
imports: []const []const u8 = &.{},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for importing files into test cases so I could test importing ZON. I'm sure there's a better way, but this seems to work fine as a stop gap. IIUC there are plans to rework this eventually anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that the existing compiler error tests pass if there is no error when one was expected. It's possible this has been resolved since I noticed it I have not checked recently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that the existing compiler error tests pass if there is no error when one was expected.

Oh, that sounds like a problem! Any chance you could check if that's been fixed, and if not try to fix it up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I phrased this poorly--I meant that it was a preexisting bug, not an issue with my changes to the test runner.

I'll check if it's still an issue before merging and file an issue to track it if it is (I don't want to fix it in this PR because it'll likely result in updates to a bunch of unrelated tests.)

src/Module.zig Outdated Show resolved Hide resolved
Copy link
Member

@mlugg mlugg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding & in ZON...

You're correct that ZON definitely shouldn't include this. Our main use of ZON today (build.zig.zon) already creates variable-length structures without this syntax. To avoid a divergence between ZON you can @import and build.zig.zon, I would consider this a merge blocker, but that's Andrew's call of course.

Implementation-wise, the ZIR instruction corresponding to @import should be modified to provide a result type if one is present (you can use .none to represent the case where there isn't one). Then, the ZON analysis logic should traverse into this result type as needed. This has the additional advantage of providing better source locations if the structure of an imported file doesn't match what's expected, since the type error will occur whilst parsing the relevant ZON expression.

src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
@MasonRemaley
Copy link
Contributor Author

MasonRemaley commented Jun 12, 2024

Regarding & in ZON...

You're correct that ZON definitely shouldn't include this. Our main use of ZON today (build.zig.zon) already creates variable-length structures without this syntax. To avoid a divergence between ZON you can @import and build.zig.zon, I would consider this a merge blocker, but that's Andrew's call of course.

Implementation-wise, the ZIR instruction corresponding to @import should be modified to provide a result type if one is present (you can use .none to represent the case where there isn't one). Then, the ZON analysis logic should traverse into this result type as needed. This has the additional advantage of providing better source locations if the structure of an imported file doesn't match what's expected, since the type error will occur whilst parsing the relevant ZON expression.

Hmm I think I agree, better to knock it out now than to release it as is then immediately change the syntax. I would've done this up front but incorrectly thought it required major changes to the compiler.

I'll take a look at making this change. If no result type is specified, I'll have it default to a tuple rather than a slice unless there are any objections to that default.

@mlugg
Copy link
Member

mlugg commented Jun 12, 2024

Yup. I agree that it makes sense to use a tuple when there's no result type.

By the way, in case you're unfamiliar with the AstGen code, you'll just want try rl.ri.resultType(gz, node) orelse .none in the corresponding case in AstGen.builtinCall.

tiehuis added a commit to tiehuis/zig that referenced this pull request Jun 14, 2024
There were two primary issues at play here:
 1. The hex float prefix was not handled correctly when the stream was
    reset for the fallback parsing path, which occured when the mantissa was
    longer max mantissa digits.
 2. The implied exponent was not adjusted for hex-floats in this branch.

Additionally, some of the float parsing routines have been condensed, making
use of comptime.

closes ziglang#20271
tiehuis added a commit to tiehuis/zig that referenced this pull request Jun 14, 2024
There were two primary issues at play here:
 1. The hex float prefix was not handled correctly when the stream was
    reset for the fallback parsing path, which occured when the mantissa was
    longer max mantissa digits.
 2. The implied exponent was not adjusted for hex-floats in this branch.

Additionally, some of the float parsing routines have been condensed, making
use of comptime.

closes ziglang#20271
@andrewrk
Copy link
Member

andrewrk commented Jun 16, 2024

+ echo Looking for non-conforming code formatting...
+ stage3-debug/bin/zig fmt --check .. --exclude ../test/cases/ --exclude ../doc/ --exclude ../build-debug
Looking for non-conforming code formatting...
../test/behavior/zon/a_neg.zon:1:1: error: expected type expression, found '-'
../test/behavior/zon/slice-1.zon:1:1: error: expected type expression, found '&'
../test/behavior/zon/void.zon:1:1: error: expected type expression, found '{'
../test/behavior/zon/slice-empty.zon:1:1: error: expected type expression, found '&'
../test/behavior/zon/false.zon
../test/behavior/zon/slice-abc.zon:1:1: error: expected type expression, found '&'

In case you haven't found why this is happening, it's here:

# TODO: move this to a build.zig step (check-fmt)
echo "Looking for non-conforming code formatting..."
stage3-release/bin/zig fmt --check .. \
--exclude ../test/cases/ \
--exclude ../doc/ \
--exclude ../build-debug \
--exclude ../build-release

Edit: I'll address this in a PR shortly. Sit tight for a bit.

@andrewrk
Copy link
Member

andrewrk commented Jun 17, 2024

After #20321 you should see the same failures locally when running zig build test, and you can check only code formatting conformance with zig build test-fmt.

Edit: I think if you rebase and resolve that build.zig conflict, those CI failures will disappear.

@MasonRemaley
Copy link
Contributor Author

MasonRemaley commented Jun 19, 2024

Nice thanks! I'll do the rebase and get started on removing & from the syntax when I'm back in town next week.

lib/std/zon.zig Outdated Show resolved Hide resolved
@MasonRemaley
Copy link
Contributor Author

MasonRemaley commented Jun 28, 2024

By the way, in case you're unfamiliar with the AstGen code, you'll just want try rl.ri.resultType(gz, node) orelse .none in the corresponding case in AstGen.builtinCall.

Thanks, I'm not familiar with this code so this is helpful!

I started implementing storing the result type on imports, but got a bit stuck.

AstGen used to use addStrTok(...) to store the import path. Since we now want to store both the import path and a result type, I store a Zir.Inst.Bin Heres's what AstGen does now:

// AstGen.builtinCall
const operand_node = params[0];
const rhs = try expr(gz, scope, .{ .rl = .none }, operand_node);
try gz.addPlNode(.import, node, Zir.Inst.Bin{
    .lhs = result_type,
    .rhs = rhs,
});

Zir used to use .str_tok to get the import path, that's also been updated:

// Zir.zirImport
const inst_data = sema.code.instructions.items(.data)[@intFromEnum(inst)].pl_node;
const extra = sema.code.extraData(Zir.Inst.Bin, inst_data.payload_index).data;
const src = block.nodeOffset(inst_data.src_node);
const operand = try sema.resolveConstString(
    block,
    src,
    extra.rhs,
    .{ .needed_comptime_reason = "import path must be comptime-known" },
);

When this code runs, resolveConstString fails:

thread 7410 panic: index out of bounds: index 123056708, len 16
Analyzing lib/std/std.zig
    > %344 = import(%3306982439, %123057044)
      %345 = break_inline(%343, %344)
    For full context, use the command
      zig ast-check -t lib/std/std.zig

  in lib/std/std.zig
    > %436 = decl_val("start")

/home/mason/Documents/zon/zig/src/Sema.zig:299:25: 0x6884afc in contains (zig)
        return map.items[@intFromEnum(key) - @intFromEnum(map.start)] != .none;
                        ^
/home/mason/Documents/zon/zig/src/Sema.zig:255:26: 0x649170c in get (zig)
        if (!map.contains(key)) return null;
                         ^
/home/mason/Documents/zon/zig/src/Sema.zig:1857:39: 0x6491605 in resolveInst (zig)
        const inst = sema.inst_map.get(i).?;
                                      ^
/home/mason/Documents/zon/zig/src/Sema.zig:1887:42: 0x6d84bcf in resolveConstString (zig)
    const air_inst = try sema.resolveInst(zir_ref);
                                         ^
/home/mason/Documents/zon/zig/src/Sema.zig:13928:48: 0x68ae4b4 in zirImport (zig)
    const operand = try sema.resolveConstString(

It appears that the value I get back in Sema for extra.rhs is garbage, instead of the one that I put in during AstGen.

Any pointers would be much appreciated--I can also push the WIP commit if that would be helpful, but I imagine that I'm probably missing something that will be obvious to people more familiar with this part of the codebase.

@mlugg
Copy link
Member

mlugg commented Jun 28, 2024

You probably just need to clear your cache directory -- ZIR is cached based on zig version, so if you haven't created a commit with your changes, the compiler will use the old cached ZIR!

However, your lowering strategy for import isn't really suitable. Since the operand must be a string literal, it's just a waste of bytes and time to use expr to actually create a str instruction. So, you shouldn't use Zir.Inst.Bin as the payload here -- instead, create a new payload type which stores result_ty: Ref, name: NullTerminatedString, and use that as the payload. (For the union field, pl_node is fine; pl_tok would also be fine.)

@MasonRemaley
Copy link
Contributor Author

You probably just need to clear your cache directory -- ZIR is cached based on zig version, so if you haven't created a commit with your changes, the compiler will use the old cached ZIR!

Ha that was it, thanks! I made the same mistake yesterday and didn't quite understand how I fixed it--that had been bugging me glad to have it resolved.

However, your lowering strategy for import isn't really suitable. Since the operand must be a string literal, it's just a waste of bytes and time to use expr to actually create a str instruction. So, you shouldn't use Zir.Inst.Bin as the payload here -- instead, create a new payload type which stores result_ty: Ref, name: NullTerminatedString, and use that as the payload. (For the union field, pl_node is fine; pl_tok would also be fine.)

Makes sense, will do!

src/print_zir.zig Outdated Show resolved Hide resolved
@MasonRemaley
Copy link
Contributor Author

@andrewrk Thanks for the feedback! I'll work through all the review comments, and then ping here again when I'm done.

@VisenDev
Copy link

I hope this PR isn't abandoned! Excited to finally have a ZON parser in the stdlib

@MasonRemaley
Copy link
Contributor Author

@VisenDev it's not abandoned--this feature is very important to my work. I'm working on some other stuff w/ tight deadlines right now, but I'm planning on getting back to this in about a week and a half.

* Gets all tests passing after rebase
* Resolves a bunch of TODOs
* WIP commit while I build release version of main
* Errors on embedded nulls in identifiers
* Parsing identifiers properly without allocating
* Properly frees parsed idents
* Fixes integer negation logic
* Uses existing big int parser
* Cleans up runtime int parsing
* Some investigation into float issue
* More float investigation
* Adds missing test
* Resolves TODOs
* Fixes some error tokens
* Cleans up remaining parsing todos
* Removes TODOs
* Excludes tests format formatting
* Formats
* Parses directly as desired float type
* Moves zon lowering into own file

Records result type for imports, does not yet use

Fixes print zir for imports, better encoding of instruction

Uses the result type to add indirection where necessary

Does not yet remove & from syntax

Removes & from comptime ZON syntax

Yet to be removed from runtime syntax

Fixes after rebase

Allow coercion to slices, not to any other pointer type, adds tests

No longer allows `&` in the runtime parser

Also fixes merge

Fixes bug where you could parse structs as tuples and vice versa

Also cleans up error handling

Reworks runtime error handling

Explains usage of unreachable

Fixes CI failure, and exposes notes in API

Shows notes on integer literal errors when parsing ZON

Fixes free of undefined value if struct with allocated fields is missing a field

Skips tests failing due to C backend issue

Notes why tests are skipped, renames to make easier to use test filter
@andrewrk
Copy link
Member

FYI the last several CI runs have all failed the same way: compile errors when targeting a 32-bit system. You can easily reproduce with zig build -Dtarget=arm-linux.

@MasonRemaley
Copy link
Contributor Author

MasonRemaley commented Nov 13, 2024

Thanks for the explanation! I don't expect the CI to succeed right now as I'm partway through refactoring to parse imports to a known result type.

(Locally, the tests that I expect to pass at this point are all passing, and I expect all of them will on CI too once I'm done--if not I'll try this.)

[EDIT] Totally missed that you said compile error. You explained in person, my last commit likely fixes the issue, if it doesn't I'll test it locally to sort it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants