Go to file
float-trip 0ed516b714 'distributor' -> 'batcher' 2023-07-11 15:57:28 +00:00
batcher.py 'distributor' -> 'batcher' + slow down new ID checks 2023-07-11 15:37:58 +00:00
batcher_config.example.yaml Initial commit. 2023-07-11 04:55:36 +00:00
compile_proto.sh Initial commit. 2023-07-11 04:55:36 +00:00
fetcher.py 'distributor' -> 'batcher' + slow down new ID checks 2023-07-11 15:37:58 +00:00
fetcher_config.example.yaml 'distributor' -> 'batcher' 2023-07-11 15:57:28 +00:00
id_service.proto 'distributor' -> 'batcher' + slow down new ID checks 2023-07-11 15:37:58 +00:00
readme.md Update readme 2023-07-11 05:05:24 +00:00
requirements.txt Initial commit. 2023-07-11 04:55:36 +00:00

readme.md

Reddit Ingest

Distributed server/client setup for ingesting all of reddit.

  • Scales to multiple clients.
  • Supports reddit authentication.
  • Tolerant to clients losing state/going offline.
# ...install PostgreSQL...
pip install -r requirements.txt
# ...modify example yamls...
mv batcher_config.example.yaml batcher_config.yaml
mv fetcher_config.example.yaml fetcher_config.yaml
mkdir logs
bash compile_proto.sh

# Run one instance of batcher.py:
python batcher.py

# And several instances of fetcher.py:
python fetcher.py

Getting a refresh token: https://praw.readthedocs.io/en/stable/tutorials/refresh_token.html#obtaining-refresh-tokens