(CLI = Command Line Interface)
Check to see if kaggle-cli
is installed:
kaggle-cli --version
Install kaggle-cli
:
pip install kaggle-cli
or pip3 install kaggle-cli
May need to update package if you run into errors:
pip install kaggle-cli --upgrade
or pip3 install kaggle-cli --upgrade
Note 1: You must have a Kaggle user ID and password. If you logged in to Kaggle using FB or LI, you'll have to reset your password, as that is needed for command line access to the data.
Note 2: Pick a competition, and ensure you have accepted the rules of that competition. Otherwise, you will not be able to download the data using the CLI.
https://www.kaggle.com/c/dogs-vs-cats
Note: the competition name can be found in the url; here it is dogs-vs-cats
https://www.kaggle.com/c/dogs-vs-cats/rules
ls
mkdir data
cd data
my example
ubuntu@ip-10-0-0-13:~$ ls
anaconda2 anaconda3 downloads git nbs temp
ubuntu@ip-10-0-0-13:~$ mkdir data
ubuntu@ip-10-0-0-13:~$ cd data
Syntax:
kg config -g -u 'username' -p 'password' -c 'competition'
kg download
Note: Here's an example of warning message I receive when I tried to download data before accepting the rules of the competition:
my example
ubuntu@ip-10-0-0-13:~/data$ kg config -g -u 'reshamashaikh' -p 'xxx' -c dogs-vs-cats
ubuntu@ip-10-0-0-13:~/data$ kg download
Starting new HTTPS connection (1): www.kaggle.com
downloading https://www.kaggle.com/c/dogs-vs-cats/download/sampleSubmission.csv
sampleSubmission.csv N/A% | | ETA: --:--:-- 0.0 s/B
Warning: download url for file sampleSubmission.csv resolves to an html document rather than a downloadable file.
Is it possible you have not accepted the competition's rules on the kaggle website?
Note 1: I have accepted the competition rules; will try downloading again
config -g -u 'username' -p 'password' -c 'competition'
kg download
my example
ubuntu@ip-10-0-0-13:~/data$ kg config -g -u 'reshamashaikh' -p 'xxx' -c dogs-vs-cats
ubuntu@ip-10-0-0-13:~/data$ kg download
Starting new HTTPS connection (1): www.kaggle.com
downloading https://www.kaggle.com/c/dogs-vs-cats/download/sampleSubmission.csv
Starting new HTTPS connection (1): storage.googleapis.com
sampleSubmission.csv 100% |##################################################################################################################| Time: 0:00:00 320.2 KiB/s
downloading https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip
test1.zip 100% |#############################################################################################################################| Time: 0:00:08 32.5 MiB/s
downloading https://www.kaggle.com/c/dogs-vs-cats/download/train.zip
train.zip 100% |#############################################################################################################################| Time: 0:00:17 31.4 MiB/s
Note: sometimes setting up the configuration results in an error the next time you try to download another competition. You may want to bypass configuration and directly include your user ID, password and competition name in one command line.
kg download -u 'reshamashaikh' -p 'xxx' -c statoil-iceberg-classifier-challenge
ls -alt
ubuntu@ip-10-0-0-13:~/data$ ls -alt
total 833964
-rw-rw-r-- 1 ubuntu ubuntu 569546721 Nov 4 18:24 train.zip
drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 4 18:24 .
-rw-rw-r-- 1 ubuntu ubuntu 284321224 Nov 4 18:24 test1.zip
-rw-rw-r-- 1 ubuntu ubuntu 88903 Nov 4 18:23 sampleSubmission.csv
drwxr-xr-x 22 ubuntu ubuntu 4096 Nov 4 18:23 ..
ubuntu@ip-10-0-0-13:~/data$
Note 1: You will need to install and use unzip
to unzip files.
sudo apt install unzip
unzip train.zip
unzip -q test.zip (Note: -q
means to unzip quietly, suppressing the printing)
ubuntu@ip-10-0-0-13:~/nbs/data$ ls train/dogs/dog.1.jpg
train/dogs/dog.1.jpg
ubuntu@ip-10-0-0-13:~/nbs/data$ ls -l train/dogs/ | wc -l
12501
ubuntu@ip-10-0-0-13:~/nbs/data$
ubuntu@ip-10-0-0-13:~/nbs/data$ ls -l train/cats/ | wc -l
12501
ubuntu@ip-10-0-0-13:~/nbs/data$
ubuntu@ip-10-0-0-13:~/nbs/data$ ls test1 | wc -l
12500
ubuntu@ip-10-0-0-13:~/nbs/data$
kg submit <submission-file> -u <username> -p <password> -c <competition> -m "<message>"
my example
/home/ubuntu/data/iceberg/sub
(fastai) ubuntu@ip-172-31-2-59:~/data/iceberg/sub$
kg submit resnext50_sz150_zm13.csv -u 'reshamashaikh' -p 'xxx' -c statoil-iceberg-classifier-challenge
Good to copy 100 or so the sample directory; enough to check that the scripts are working
Advice 1: Separate TEST data into VALIDATION TASK: move 1000 each dogs / cats into valid
> ls valid/cats/ | wc -l
1000
> ls valid/dogs/ | wc -l
1000
Advice 2: Do all of your work on sample data
> ls sample/train
> ls sample/valid
> ls sample/train/cats | wc -l
8
> ls sample/valid/cats | wc -l
4
Another option is to use the Kaggle API https://github.com/Kaggle/kaggle-api