You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am following the tutorial from Andrej K. building gpt2 from scratch. I thought it would be a good idea to visualize his GPT2 model using torchexplorer.
This what I did:
install torchexplorer (windows10, used the pygraphviz from alubbock channel)
placed the torchexplorer.watch(model, log=['io', 'params'], backend='standalone') before calling the model.
run one step0 with a full forward and backward pass.
check out localhost:8080
However, it seems to only capture the input and output excluding the whole network, any thought?
I started with his first commit in the project to avoid more complex operations like DDP, context manager etc.
Here is the code.
classDataLoaderLite:
def__init__(self, B, T):
self.B=Bself.T=T# at init load tokens from disk and store them in memorywithopen('input.txt', 'r') asf:
text=f.read()
enc=tiktoken.get_encoding('gpt2')
tokens=enc.encode(text)
self.tokens=torch.tensor(tokens)
print(f"loaded {len(self.tokens)} tokens")
print(f"1 epoch = {len(self.tokens) // (B*T)} batches")
# stateself.current_position=0defnext_batch(self):
B, T=self.B, self.Tbuf=self.tokens[self.current_position : self.current_position+B*T+1]
x= (buf[:-1]).view(B, T) # inputsy= (buf[1:]).view(B, T) # targets# advance the position in the tensorself.current_position+=B*T# if loading the next batch would be out of bounds, resetifself.current_position+ (B*T+1) >len(self.tokens):
self.current_position=0returnx, yimporttiktoken# -----------------------------------------------------------------------------# attempt to autodetect the devicedevice="cpu"iftorch.cuda.is_available():
device="cuda"elifhasattr(torch.backends, "mps") andtorch.backends.mps.is_available():
device="mps"print(f"using device: {device}")
torch.manual_seed(1337)
iftorch.cuda.is_available():
torch.cuda.manual_seed(1337)
train_loader=DataLoaderLite(B=4, T=32)
# get logitsmodel=GPT(GPTConfig())
model.to(device)
# optimize!optimizer=torch.optim.AdamW(model.parameters(), lr=3e-4)
torchexplorer.watch(model, log=['io', 'params'], backend='standalone')
foriinrange(50):
x, y=train_loader.next_batch()
x, y=x.to(device), y.to(device)
optimizer.zero_grad()
logits, loss=model(x, y)
loss.backward()
optimizer.step()
print(f"step {i}, loss: {loss.item()}")
break
The text was updated successfully, but these errors were encountered:
I can confirm that I've been able to reproduce this -- unfortunately, I don't have the bandwidth this summer to look into what's happening with this architecture. Thank you for the bug report in any case.
I am following the tutorial from Andrej K. building gpt2 from scratch. I thought it would be a good idea to visualize his GPT2 model using torchexplorer.
This what I did:
torchexplorer.watch(model, log=['io', 'params'], backend='standalone')
before calling the model.However, it seems to only capture the input and output excluding the whole network, any thought?
I started with his first commit in the project to avoid more complex operations like DDP, context manager etc.
Here is the code.
The text was updated successfully, but these errors were encountered: