-
-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefix and suffix and their appearance in generated samples #40
Comments
It would have been better to open an issue for each question. Anyway:
|
@woctezuma thank you for the quick reply and for creating a new issue for question 1. I partially understand your answer to question 2 in the way that these tags are used to distinguish between documents within a large file. However:
I do not completely understand this. With the settings Thanks again for your time and I hope you do not mind answering these questions. |
Caveat: this is all my understanding of the program, I might be wrong.
|
@woctezuma Thank you so much for your explanation, the answers are very clear! :) |
I have a small dataset (~2MB) consisting of columns written by a journalist throughout recent years. Each column is prepended with '<|startoftext|>' and appended with '<|endoftext|>'. I have two questions:
gpt2.generate(sess, length=310, temperature=0.7, prefix="<|startoftext|>", include_prefix=False, truncate="<|endoftext|>", nsamples=5, batch_size=5 )
the pre- and suffix still appear in the middle of texts. Am I doing wrong? Or is this normal?
Although I spent quite some time on GPT-2 now, I still find it hard to grasp this part, so any help is greatly appreciated.
The text was updated successfully, but these errors were encountered: