Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Get the comments of .toml #17

Open
s3xysteak opened this issue Apr 25, 2024 · 5 comments
Open

[Feature Request] Get the comments of .toml #17

s3xysteak opened this issue Apr 25, 2024 · 5 comments
Labels
enhancement New feature or request feedback wanted Additional feedback is needed

Comments

@s3xysteak
Copy link

[features]
# Hi!
foo = "bar"

After parse it and reture the stringify, it Become:

[features]
foo = "bar"

Is it possible to keep the comments? Maybe like:

const content = parse(exampleAbove)
const target = {
  features: {
    __comment_1: ' Hi!'
    foo: 'bar'
  }
}
@cyyynthia
Copy link
Member

The parser step is currently quite lossy, as it doesn't preserve comment information (nor formatting). If you read a toml file and stringify it back, it will produce a "full" table; which means mostly the same thing but is most likely unwanted.

Because the lib maps toml files to plain objects (without intermediary attributes), I'm not sure how to keep track of comments (and possibly other metadata such as "is this table an inline table" for formatting purposes) in the output. I'm not sure I like the extra key way, as this doesn't preserve information about where the comment was...

I thought about having the library expose metadata via content[Symbol('smol-toml::metadata')], that'd be able to hold everything but I'm not sure about that approach either. It would allow any data to be in there so it could preserve everything (formatting, comments, etc...) making the lib suitable for non-destructive toml editing.

Do you have any opinions on this?

@cyyynthia cyyynthia added enhancement New feature or request feedback wanted Additional feedback is needed labels Apr 25, 2024
@andriemc
Copy link

The parser step is currently quite lossy, as it doesn't preserve comment information (nor formatting). If you read a toml file and stringify it back, it will produce a "full" table; which means mostly the same thing but is most likely unwanted.

Because the lib maps toml files to plain objects (without intermediary attributes), I'm not sure how to keep track of comments (and possibly other metadata such as "is this table an inline table" for formatting purposes) in the output. I'm not sure I like the extra key way, as this doesn't preserve information about where the comment was...

I thought about having the library expose metadata via content[Symbol('smol-toml::metadata')], that'd be able to hold everything but I'm not sure about that approach either. It would allow any data to be in there so it could preserve everything (formatting, comments, etc...) making the lib suitable for non-destructive toml editing.

Do you have any opinions on this?

comments could work like how they do in hjson (the js library for human json), where they store all comments in a _comments_ array in the data object and when it's stringified back into hjson, it add the comments. (they also do it with whitespace but since this is toml we dont really need that in smol-toml)

@s3xysteak
Copy link
Author

jsonc-parser could also keep comments. Recently I use it to edit .vscode/settings.json and keep comments. Hope it could be useful to you!

@cyyynthia
Copy link
Member

cyyynthia commented Sep 30, 2024

I thought about it for a while, I haven't yet settled on the impl details (likely the top-level symbol) but here are the things I thought about and/or are worth noting (including scraped ideas):

  • smol-toml's goal (at least for the time being) isn't to produce byte-identical strings. the goal is best-effort comment-preserving edition of toml; precise formatting details will always be discarded (indents normalized, spaces, etc.).

  • (scraped..?) since the notion of document is completely destroyed after parsing, comments need to be anchored to "something" so they can be addressed. we can define rules to define what the comment is treated as.

Comment semantic example
# Document Header
# Document Header

# first_key Comment Header (1)

# first_key Comment Header (2)
# first_key Comment Header (2)
first_key = "meow"
# first_key Comment Footer

# second_key Comment Header

# second_key Comment Header
second_key = "meow"

# third_key Comment Header (not footer of second_key due to newline)

third_key = "meow"

# Document Footer

problem with this approach is that it makes newlines have an importance semantically. Misses the mark wrt principle of least astonishment. Probably very annoying to deal with (e.g. automatic docgen).

  • (considered, have issues) strings are always references in JS. it is free (or rather, only costs a compressed pointer) to add a string to an array. with that knowledge, we can use a much simpler approach of defining comments before and after a given key. the comments between 2 keys are both references as after key1 and before key2. once a key is encountered, no longer add comments to the list of comments below.

while very simple, this assumes the key order will be consistent which is not guaranteed. key order is entirely unpredictable. at least, the spec says so despite all major impl giving back keys in the order of insertion. but this can be changed very easily at runtime and unpredictable. it is more usable than the other option, but this issue is annoying me as it makes it difficult to reconstruct the document (unless comments must come with information about key order and preserve it at all costs). another issue is where to add comments. for existing keys below key1 and above key2 will most likely use the same array as an optimization, so that's a single place to add comments. for new keys, using top and bottom unconditionally seems reasonable for stringify. commentsBeforeK1 K1 commentsAfterK1 commentsBeforeK2 K2 commentsAfterK2, notice between K1 and K2.

another issue is with regards to table headers. it may be possible to change the rules of the "use both arrays in between" by doing so if key1.below !== key2.above. This however is insufficient to handle the case of keys in-between table headers... another problem is how to deal with comments inside arrays. that's completely stupid for sure, but has to be dealt with ugh... maybe treating all children as a key and using arr.0, arr.1, etc can do the trick. or have arrays specify an array to map these (e.g. [null, {before:..., after:...},{before:...,after:...},null,null]).

  • parsed comments will be returned untrimmed at all, in case spacings have a meaning to the consumer. raises the question of how to format newly added comment. expecting users to add the spacing themselves is not great, but producing comments glued to the # is not pretty. not a big deal, but a nit that bothers me nonetheless

sorry if the comment is a bit "raw" and hard to read through, it's quite frankly a core dump of my brain lmao. i'll try to write a concrete example of the impl under consideration and the result object tomorrow. if not, i'll do my best to come up with an example this week.

@cyyynthia
Copy link
Member

cyyynthia commented Oct 13, 2024

Sorry for the very late PoC, had a lot on my plate as of lately. So, here is the thing I had in mind. It's producing a big object but I think that's the price to pay for an exhaustive representation and good support of non-too-destructive edit of toml documents, among other things.

While manual traversal of the object is possible, it'd probably get annoying quite fast due to the single-letter keys used. Helper functions would be provided to make it easier to retrieve (and append) comments of the document.

TOML Document
# This is an example TOML document with comments
# To showcase how smol would extract and represent them

# Comment right above first key
first_key = "Hello, world!"
# Comment right below first key

# Comment in-between keys

# Comment right above second key
second_key = "Hello, world!"
# Comment right below second key

third_key = "Hello, world!" # Comment right after the key

fourth_key = [ 1, 2, # Comment in the middle of the array!
3, 4 ]

# Comment above header
[header]
[header.sub1]
# Comment between sub-headers
[header.sub2]
fifth_key = "Hello, world!" # Did it work?

# This is a footer comment.
Expected output
const obj = {
	first_key: "Hello, world!",
	second_key: "Hello, world!",
	third_key: "Hello, world!",
	fourth_key: [1, 2, 3, 4],
	header: {
		sub1: {},
		sub2: {
			fifth_key: "Hello, world!"
		}
	},
	[Symbol('smol-toml::metadata')]: {
		/* styles (inferred) */ s: ..., // reserved for #13
		/* order (root) */ o: {
			/* order */ o: [ 'first_key', 'second_key', 'third_key', 'fourth_key', 'header' ],
			/* children */ c: {
				header: {
					/* order */ o: [ 'sub1', 'sub2' ],
					/* children */ c: {
						sub2: {
							/* order */ o: [ 'fifth_key' ],
						},
					},
				},
			},
		},
		/* comments */ c: {
			/* header */ h: &1 [
				[
					'# This is an example TOML document with comments',
					'# To showcase how smol would extract and represent them',
				],
				[
					'# Comment right above first key',
				],
			],
			/* keys*/ k: {
				first_key: {
					/* above */ a: *1 [...],
					/* below */ b: &2 [
						[ '# Comment right below first key' ],
						[ '# Comment in-between keys' ],
						[ '# Comment right above second key' ],
					],
					/* stuck */ s: 3, // 0b11 - above & below stuck to key
				},
				second_key: {
					/* above */ a: *2 [...],
					/* below */ b: &3 [[ '# Comment right below second key' ]],
					/* stuck */ s: 3, // 0b11 - above & below stuck to key
				},
				third_key: {
					/* above */ a: *3 [...],
					/* next */ n: '# Comment right after the key'
					/* stuck */ s: 0, // 0b00 - no comment stuck to key
				},
				fourth_key: {
					/* inline */ i: [
						[],
						[['# Comment in the middle of the array!']],
						[],
						[],
					],
					/* below */ b: *4 [...],
					/* stuck */ s: 0, // 0b00 - no comment stuck to key
				},
				header: {
					/* above */ a: &4 [[ '# Comment above header' ]],
					/* stuck */ s: 1, // 0b01 - above stuck to key
					/* children */ c: {
						sub1: {
							/* below */ b: &5 [[ '# Comment between sub-headers' ]],
							/* stuck */ s: 2, // 0b10 - below stuck to key
						}
						sub2: {
							/* above */ a: *5 [...],
							/* stuck */ s: 1, // 0b01 - above stuck to key
							/* children */ c: {
								fifth_key: {
									/* next */ n: '# Did it work?',
									/* below */ b: *6 [...],
									/* stuck */ s: 0, // 0b00 - no comment stuck to key
								},
							},
						},
					},
				},
			},
			/* footer */ f: &6 [[ '# This is a footer comment.' ]],
		},
	},
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feedback wanted Additional feedback is needed
Projects
None yet
Development

No branches or pull requests

3 participants