I use tensors to do transformation then I save it in a list. Later, I will make it a dataset using Dataset
, then finally DataLoader
to train my model. To do it, I can simply use:
l = [tensor1, tensor2, tensor3,...]
dataset = Dataset.TensorDataset(l)
dataloader = DataLoader(dataset)
I wonder what is the best practice doing so, to avoid RAM overflow if the size of l
grows? Can something like Iterator
avoid it?
Save tensors
for idx, tensor in enumerate(dataloader0):
torch.save(f"{my_folder}/tensor{idx}.pt")
Create dataset
class FolderDataset(Dataset):
def __init__(self, folder):
self.files = os.listdir(folder)
self.folder = folder
def __len__(self):
return len(self.files)
def __getitem__(self, idx):
return torch.load(f"{self.folder}/{self.files[idx]}")
And then you can implement your own dataloader. If you can't hold the whole dataset in memory, some file system loading is required.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments