b1232475d6fedd1595bb4a3850f6e43cb0638412
Instagram Profile Scraper
Scrapes the last N posts from a list of Instagram profiles and saves the results as CSV and Markdown. Uses Playwright with a real authenticated browser session — no API keys required.
What it does
- Reads a list of Instagram profile URLs from
profiles.txt - Opens a Chromium browser (visible, non-headless)
- On first run: prompts you to log in manually, then saves the session to
auth_state.json - On subsequent runs: reuses the saved session automatically
- Visits each profile, collects the last 5 post URLs
- Visits each post and extracts: date, caption, likes, image URLs, hashtags, mentions, location, media type
- Writes combined results to
output.csvandoutput.md
Setup
Requires Python 3.11+ and uv.
git clone <repo-url>
cd instagram-scraper
uv sync
uv run playwright install chromium
Configuration
profiles.txt — one Instagram profile URL per line. Lines can optionally be prefixed with a number and tab (the scraper strips them):
https://www.instagram.com/username1/
https://www.instagram.com/username2/
https://www.instagram.com/username3/
Constants in scraper.py (edit directly):
| Constant | Default | Description |
|---|---|---|
POSTS_PER_PROFILE |
5 |
How many posts to scrape per profile |
PROFILES_FILE |
profiles.txt |
Input file path |
OUTPUT_CSV |
output.csv |
CSV output path |
OUTPUT_MD |
output.md |
Markdown output path |
Usage
uv run python scraper.py
On first run, a Chromium window opens. Log in to Instagram, then press Enter in the terminal. The session is saved to auth_state.json and reused on future runs.
If Instagram logs you out, delete auth_state.json and run again.
Output
CSV (output.csv)
One row per post with these columns:
| Column | Description |
|---|---|
profile |
Instagram username |
post_url |
Full URL of the post |
date |
ISO 8601 datetime (e.g. 2026-02-23T15:28:13.000Z) |
caption |
Full post caption text |
likes |
Like count (as displayed) |
image_urls |
Comma-separated CDN image URLs |
hashtags |
Comma-separated hashtags from caption |
mentions |
Comma-separated @mentions from caption |
location |
Location tag text (empty if none) |
media_type |
photo, video, or carousel |
Markdown (output.md)
Same data, grouped by profile. Each post is a section with all fields as a bullet list.
Error handling
- Private or unavailable profiles: skipped with a warning, scraping continues
- Individual post failures: skipped with a warning, scraping continues
- Missing fields: stored as empty string, no crash
- Keyboard interrupt (Ctrl+C): saves whatever has been collected so far
Files
instagram-scraper/
├── scraper.py # main script
├── profiles.txt # input: list of profile URLs
├── pyproject.toml # project metadata and dependencies
├── tests/
│ └── test_parsers.py # unit tests for parsing functions
├── auth_state.json # saved session (created on first run, gitignored)
├── output.csv # results (gitignored)
└── output.md # results (gitignored)
Running tests
uv run pytest tests/ -v
Notes
- Works with public profiles. Private profiles are skipped.
- Instagram rate-limits aggressive scraping. The script adds a 1.5s wait between post requests.
- Session cookies expire periodically. Delete
auth_state.jsonto re-authenticate. - Image URLs are CDN URLs that expire after some time — download them promptly if needed.
Description
Languages
Python
100%