-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add File API proposal design document #3101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general!
tl;dr:
- I am for only async functions even at the cost of some ergonomity shortterm
- Smaller API will be easier to stabilize and rewrite as we need and expand on the current code. As such I am for dropping the majority of the "porcelain".
- While this code improves stuff it is not very useful until we get streams and support for streams across k6.
The initial API will mostly be asynchronous, except for the `open` functionality which will be synchronous due to the current lack of support for `await` operations within the init context. | ||
|
||
The API will have the following characteristics: | ||
- Load file content exactly once into memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well .. the underlying implementation already loads it once and if it over 100kb it likely loads a couple of more times. This is blocked on removing afero.
But hopefully this will be a lot better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So removing afero
is part of the implementation? Should we have a rude but working PoC with it? We should also link the issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't argue removing afero
is part of the implementation.
afero adds +1 on the times we load it in memory and there is +1-2x if it is fairly big file.
This second memory though is temporarily and will be reused. You can see a more detailed explanation here
So we will have at least 2 copies of the file in memory, which is far from ideal, but definitely a lot better than having a copy per VU.
The whole proposal currently while having some values does not make it a lot more useful to open big files, as the moment you have to use them you still need to start copy. So dropping the 2 copies of this part is likely negligible to getting streams and not copying on the "other side" where this data will be used by something.
I am not against removing afero entirely and would like to prioritise it again, but I don't think this is our biggest problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also keep removing afero out of scope for this design.
Thanks for the heads-up on afero indeed, I didn't have the full context in mind regarding how it handles memory itself 🙇🏻 I'll try to rephrase to be more accurate.
* Resolves to a `FileInfo` describing the file. | ||
*/ | ||
stat(): Promise<FileInfo> | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No close
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether we need one or not depends a lot on the underlying implementation, so I was on the verge. As a safety net, and for the API to fill intuitive indeed, it's probably better to have one indeed. Adding it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think File.close()
would be confusing with the current implementation, considering open()
returns a slice with the file data.
So what would close()
actually do? It certainly wouldn't close the file, as the comment here states. If we have the concept of a FileHandle
, then close()
could make it unusable from that point on? Not sure what the purpose of that would be, since FileHandle
s shouldn't have any expensive resources to free.
I get that a counterpart to open()
would be intuitive, but since we're not dealing with traditional file handles here, a close()
doesn't make sense to me.
Maybe instead of a top-level open()
function, a File
constructor would be more intuitive? E.g.:
let f = new File('/path');
f.read() ...
This would make it clear that File
handles are cheap and disposable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding close
I had left it out of the proposal and POC implementation as they were not necessary with the way I approached solving the problem on the implementation side. But this discussion is a good opportunity to mention that, yet, I didn't want to assume what the precise underlying implementation would be. However, I had thought of one potential use of close: the file registry I used in the POC to keep files content in memory, and provide a pointer to the data to VUs, could keep a reference count for each file->vu number, and once a file is not referenced anymore (because all VUs closed it), it could explicitly drop it. But I'm not entirely convinced that's really useful in practice.
Regarding using a constructor as opposed to open
, I'm not opposed to it, but I would personally prefer to stick to an API that ressembles what you find in OSes and other languages. On another note that maybe proves interesting somehow: In my initial implementation of the POC, the File
struct was actually called FileView
to reflect that it's a view of the file content, cannot be modified, and is a "cheap and disposable" handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think File.close() would be confusing with the current implementation
implementation is the key word here.
And this is an API proposal. If after making this better we need to break the API because now close
will need to be called in order for it to work that will be bad.
I would expect that any file you open - you will be able to close. Whether that has a significant impact is a good question.
If there was a different way to tell us that "I no longer need this file" I will be fine. But at this point this is not possible AFAIK. There are some resource management tc39 proposals that I am not going to discuss as they are just proposal and will require that we have a close
like method to tell us that a file is no longer needed. So ... not really a solution.
I am okay with this not having close
for now - I do expect we will likely need to iterate on all the changes around the epic "support big files in k6" quite a lot. But wanted to lay my reasoning on wanting a close
.
Maybe instead of a top-level open() function, a File constructor would be more intuitive? E.g.:
While I am not against it in principal, I don't see why that will be better..
If anything the current API is closer to what is in deno and in other places and consequently might be reusable across platforms. Or more likely easier to support when someone wants to reuse a library. So ... 👎 on my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that open()
and close()
only make sense if the file itself is opened and closed. But since we'll only open the file and read all its data once, and each subsequent call to open()
will return a view of the data, calling close()
on this view would be confusing if people are expecting the file to be closed.
This behavior will not change in a different implementation, since it's the core of what we want to accomplish. So the naming doesn't quite align with the behavior, and repurposing the name used in other frameworks to forcefully align it wouldn't make it more intuitive--quite the contrary. As a user, I'd rather have the API clearly reflect the behavior, than have to remap my existing assumptions of what open()
and close()
do.
I suggested using a constructor instead of open()
since it doesn't imply that a close()
would be needed. But I'm not sold on it either, and we could consider alternatives like FileView
.
Anyway, this is not a blocker from my side. So if you strongly feel we should keep it as is, I'm fine with it.
Codecov Report
@@ Coverage Diff @@
## master #3101 +/- ##
==========================================
+ Coverage 73.70% 73.76% +0.06%
==========================================
Files 239 241 +2
Lines 18258 18457 +199
==========================================
+ Hits 13457 13615 +158
- Misses 3933 3970 +37
- Partials 868 872 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
567da3b
to
e045861
Compare
e045861
to
e2b8dda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @oleiade! 👏
I glanced at the PoC, and it looks reasonable. 👍
1fe97c7
to
15be818
Compare
* Add File API proposal design document * File API design document revision 1 * File API design document revision 2
This Pull Request contains a design document for a FIle API proposal that intends to tack #2977 as part of the #2974 epic.
There is a POC branch available containing a very very rough, but functional implementation of it.
Looking forward to your feedback and suggestions 🙇🏻