- Fix: cookie domain changes from twitter.com to x.com
- Fix: add install playwright browser postinstall
- Fix: use HiDPI browser
- Fix: adjust scroll height
- Fix: dynamic scroll based on page height
- Fix: consistency of csv headers order
- Feat: convert tweet-harvest csv to gephi format source,target
- Faster (lower scroll's pause time) and more large acceptable timeout so when the network is slow, the crawler can still work.
- Remove displayed image/video whenever possible to reduce amount of scrolls.
- Reduce number of unncecessary logs
- Add image_url to the output CSV file (if exists).
- Add location to the output CSV file (if exists).
- Fixed inconsistent delimiter and CSV formatting in crawl functionality.
- The delimiter has been standardized to use commas consistently throughout the CSV file.
- Ensured proper conversion of object values to strings in the crawl functionality.
- Improved CSV formatting and enhanced reliability of data extraction from Twitter data.
- Add
SEARCH_TAB
or--search-tab
or--tab
option to specify the tab to search for tweets. Default isLATEST
tab. The options areLATEST
andTOP
.
- Implemented optional exponential backoff for rate limit handling. The wait time between retries will now be calculated dynamically based on the number of attempts made, resulting in fewer requests during the rate-limit window. This should help to reduce the risk of account bans. To utilize this feature, set the
ENABLE_EXPONENTIAL_BACKOFF
environment variable to true.
- In absence of the
ENABLE_EXPONENTIAL_BACKOFF
setting or when it is set to false, the rate limit handling will default to the previous flat 1-minute retry timeout.
- While the new optional feature greatly minimizes the risk of account bans due to rate limit exceptions, it might not be suitable for all use cases due to increased wait times between the retries. Consider your scenario before enabling this feature.
- Kudos to @alvinmatias69 for the contribution!
- Implemented a recursive call to automatically click the "Retry" button whenever it appears, assuming it is due to rate limiting. This action will be repeated until the button no longer appears and the desired tweet target is achieved. Typically, the rate limit is set at 5-15 minutes, allowing us to obtain approximately 800 tweets every 5-15 minutes.