A powerful Python script that allows you to scrape messages and media from Telegram channels using the Telethon library. Features include real-time continuous scraping, media downloading, and data export capabilities.
___________________ _________
\__ ___/ _____/ / _____/
| | / \ ___ \_____ \
| | \ \_\ \/ \
|____| \______ /_______ /
\/ \/
- Scrape messages from multiple Telegram channels
- Download media files (photos, documents)
- Real-time continuous scraping
- Export data to JSON and CSV formats
- SQLite database storage
- Resume capability (saves progress)
- Media reprocessing for failed downloads
- Progress tracking
- Interactive menu interface
Before running the script, you'll need:
- Python 3.7 or higher
- Telegram account
- API credentials from Telegram
pip install -r requirements.txt
Contents of requirements.txt
:
telethon
aiohttp
asyncio
- Visit https://my.telegram.org/auth
- Log in with your phone number
- Click on "API development tools"
- Fill in the form:
- App title: Your app name
- Short name: Your app short name
- Platform: Can be left as "Desktop"
- Description: Brief description of your app
- Click "Create application"
- You'll receive:
api_id
: A numberapi_hash
: A string of letters and numbers
Keep these credentials safe, you'll need them to run the script!
- Clone the repository:
git clone https://github.com/unnohwn/telegram-scraper.git
cd telegram-scraper
- Install requirements:
pip install -r requirements.txt
- Run the script:
python telegram-scraper.py
- On first run, you'll be prompted to enter:
- Your API ID
- Your API Hash
- Your phone number (with country code)
- Your phone number (with country code) or bot, but use the phone number option when prompted second time.
- Verification code (sent to your Telegram)
When scraping a channel for the first time, please note:
- The script will attempt to retrieve the entire channel history, starting from the oldest messages
- Initial scraping can take several minutes or even hours, depending on:
- The total number of messages in the channel
- Whether media downloading is enabled
- The size and number of media files
- Your internet connection speed
- Telegram's rate limiting
- The script uses pagination and maintains state, so if interrupted, it can resume from where it left off
- Progress percentage is displayed in real-time to track the scraping status
- Messages are stored in the database as they are scraped, so you can start analyzing available data even before the scraping is complete
The script provides an interactive menu with the following options:
- [A] Add new channel
- Enter the channel ID or channelname
- [R] Remove channel
- Remove a channel from scraping list
- [S] Scrape all channels
- One-time scraping of all configured channels
- [M] Toggle media scraping
- Enable/disable downloading of media files
- [C] Continuous scraping
- Real-time monitoring of channels for new messages
- [E] Export data
- Export to JSON and CSV formats
- [V] View saved channels
- List all saved channels
- [L] List account channels
- List all channels with ID:s for account
- [Q] Quit
You can use either:
- Channel username (e.g.,
channelname
) - Channel ID (e.g.,
-1001234567890
)
Data is stored in SQLite databases, one per channel:
- Location:
./channelname/channelname.db
- Table:
messages
id
: Primary keymessage_id
: Telegram message IDdate
: Message timestampsender_id
: Sender's Telegram IDfirst_name
: Sender's first namelast_name
: Sender's last nameusername
: Sender's usernamemessage
: Message textmedia_type
: Type of media (if any)media_path
: Local path to downloaded mediareply_to
: ID of replied message (if any)
Media files are stored in:
- Location:
./channelname/media/
- Files are named using message ID or original filename
Data can be exported in two formats:
-
CSV:
./channelname/channelname.csv
- Human-readable spreadsheet format
- Easy to import into Excel/Google Sheets
-
JSON:
./channelname/channelname.json
- Structured data format
- Ideal for programmatic processing
The continuous scraping feature ([C]
option) allows you to:
- Monitor channels in real-time
- Automatically download new messages
- Download media as it's posted
- Run indefinitely until interrupted (Ctrl+C)
- Maintains state between runs
The script can download:
- Photos
- Documents
- Other media types supported by Telegram
- Automatically retries failed downloads
- Skips existing files to avoid duplicates
The script includes:
- Automatic retry mechanism for failed media downloads
- State preservation in case of interruption
- Flood control compliance
- Error logging for failed operations
- Respects Telegram's rate limits
- Can only access public channels or channels you're a member of
- Media download size limits apply as per Telegram's restrictions
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational purposes only. Make sure to:
- Respect Telegram's Terms of Service
- Obtain necessary permissions before scraping
- Use responsibly and ethically
- Comply with data protection regulations