File Error: integer division or modulo by zero #191

tiffmkell · 2020-04-04T18:04:57Z

The gpt-2-simple package was running perfectly with the code I used on a previous VM instance on the AI Platform for Google Cloud Platform that was: Tensorflow 1.14 environment, 1 NVIDIA Tesla T4, and Compute Engine default service account.

However, I created a new instance this morning: Tensorflow 1.15 instead of 1.14 and I am throwing an error when it tries to train on a corpus saved in a .txt file. The .txt file is completely loaded into the correct folder, but the model is not able to find it which is why it is returning 0 tokens to train on.

This could be a GCP issue, but I wanted to still ask if their were any dependencies in the package that needed to be updated.

Here is the error:

0it [00:00, ?it/s]
Loading dataset...
dataset has 0 tokens
Training...

ZeroDivisionError Traceback (most recent call last)
in
520 # Steps is max number of training steps
521 model = gpt2.finetune(sess, 'text_scraped.txt', model_name = model_name, steps = 1000,
--> 522 run_name = 'dog_beds')
523 model
524

/opt/conda/lib/python3.7/site-packages/gpt_2_simple/gpt_2.py in finetune(sess, dataset, steps, model_name, model_dir, combine, batch_size, learning_rate, accumulate_gradients, restore_from, run_name, checkpoint_dir, sample_every, sample_length, sample_num, multi_gpu, save_every, print_every, max_checkpoints, use_memory_saving_gradients, only_train_transformer_layers, optimizer, overwrite)
340 (_, v_loss, v_summary) = sess.run(
341 (opt_apply, loss, summary_loss),
--> 342 feed_dict={context: sample_batch()})
343
344 summary_log.add_summary(v_summary, counter)

/opt/conda/lib/python3.7/site-packages/gpt_2_simple/gpt_2.py in sample_batch()
307
308 def sample_batch():
--> 309 return [data_sampler.sample(1024) for _ in range(batch_size)]
310
311 if overwrite and restore_from == 'latest':

/opt/conda/lib/python3.7/site-packages/gpt_2_simple/gpt_2.py in (.0)
307
308 def sample_batch():
--> 309 return [data_sampler.sample(1024) for _ in range(batch_size)]
310
311 if overwrite and restore_from == 'latest':

/opt/conda/lib/python3.7/site-packages/gpt_2_simple/src/load_dataset.py in sample(self, length)
81 def sample(self, length):
82 assert length < self.total_size // len(
---> 83 self.chunks
84 ), "Dataset files are too small to sample {} tokens at a time".format(
85 length)

ZeroDivisionError: integer division or modulo by zero

tiffmkell · 2020-04-04T18:08:45Z

Randomly restarted the kernel and it began working after about 5 tries.

tiffmkell closed this as completed Apr 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File Error: integer division or modulo by zero #191

File Error: integer division or modulo by zero #191

tiffmkell commented Apr 4, 2020

tiffmkell commented Apr 4, 2020

File Error: integer division or modulo by zero #191

File Error: integer division or modulo by zero #191

Comments

tiffmkell commented Apr 4, 2020

tiffmkell commented Apr 4, 2020