📄 What Is the X.com Tweet Archiver?
The X.com Tweet Archiver is a forensic tool designed by Digital Shield Inc. to capture and preserve individual tweets (including comments) from X.com (formerly Twitter). This tool creates OCR-searchable PDF files, each labeled with a unique DOC ID and Tweet ID, making it ideal for courtroom presentation, eDiscovery production, and long-term preservation.
⚙️ How It Works
1. GUI-Driven Workflow
– Launch Chrome from the GUI
– Manually log into your Twitter/X.com account
2. Feed in Tweet URLs
– Load a CSV file containing tweet URLs (one per line)
3. Automated Capture
– Opens each tweet individually
– Scrolls the thread to load all visible replies
– Clicks ‘Show probable spam’ if found
4. PDF Generation
– Takes full scrolling screenshots of each tweet and its context
– Uses OCRmyPDF to create searchable PDFs in PDF/A format
– PDF filenames follow the format: DOC###-TweetID.pdf
📁 Output Structure
tweet_pdfs/
├── DOC001-1781234567890.pdf
├── DOC001-1781234567890.txt
├── screens/1781234567890/slice_01.png
├── capture_log.csv
📌 Key Features
– GUI interface—no coding knowledge required
– Real-time status tracking and tweet count
– Automatically scrolls and loads comment threads
– Embedded OCR using Tesseract and OCRmyPDF
– Outputs include searchable PDF, OCR text sidecar, and capture log
✅ Requirements
– Python 3.8 or higher
– Google Chrome + ChromeDriver
– Python packages: selenium, pillow
– OCRmyPDF with Tesseract OCR and Ghostscript installed
On Windows:
choco install ocrmypdf tesseract ghostscript
▶️ Usage Instructions
1. Run the script: python x_tweet_archiver.py
2. Click ‘Launch Chrome & Log In’ and manually sign into Twitter
3. Select a CSV file with tweet URLs
4. Enter a DOC ID prefix (e.g., DOC001)
5. Click ‘Start Capture’
6. PDFs and log files will be saved in the tweet_pdfs/ directory
🛡️ Forensic Reliability
The tool is built for forensic professionals and eDiscovery teams, with each output backed by timestamped logs. PDFs are searchable and compatible with legal review tools. Tweets and replies are fully preserved visually in stitched screenshots, and converted to machine-readable format using OCR.
📬 Contact
To license the X.com Tweet Archiver or request support:
📧 consulting@digitalshield.net
🌐 www.digitalshield.net