• Show this post
    So I've been playing around with the data dump for releases and would like to request that we get a data dump that does not contain the <videos> and <video> tags. From a test chunk of 1000mb here:s what I found:

    Start chunk
    Size: 1000mb, Lines: ~5 500 000

    Chunk after removing most of the video tags
    Size: 721mb, Lines: ~2 700 000

    Saving 28% in size and 2 800 000 million lines of xml.

    Aggregating this to the whole data dump would mean that the size of the dataset would decrease from about 75gb to 54gb, a whooping 20gb in uncompressed data. It would also reduce the total amount of lines to operate on with 220 million lines give or take.

    Please, developers, consider the people who work with trying to improve the quality of the database and give us this option. Besides, the video data is ephemeral and third party in comparison to the metadata that actually describes the release.

  • Show this post
    Personally I this idea. I think youtube links are external resources, not part of the actual Discogs data.

You must be logged in to post.