Download Celeb500k with Scrapy
- Python >= 3.5
- Scrapy
- Pillow
pip install scrapy Pillow
Download url files to data
folder following the instruction inside.
Run the following command
sh crawl.sh 5
where 5 is the number of retries, you should run 5-10 times to get all images
Run the following command to get the number of download folders
ls -1 data/images/<url part> | wc -l
Run the following command to get the number of downloaded images
wc -l data/images/<url part>.jl