Hello!
I have a question regarding the relative size of lmdb files compared to the .xz files. I intended to start working with the 2M subset for the S2EF task, but I cannot gauge whether the preprocessing is failing due to limited storage or some other reason.
If the uncompressed .xz train data for S2EF is 16GB, what sort of increase should I be expecting after the preprocessing?
Thank you!