Instagram Profile Scraper

Scrapes the last N posts from a list of Instagram profiles and saves the results as CSV and Markdown. Uses Playwright with a real authenticated browser session — no API keys required.

What it does

Reads a list of Instagram profile URLs from profiles.txt
Opens a Chromium browser (visible, non-headless)
On first run: prompts you to log in manually, then saves the session to auth_state.json
On subsequent runs: reuses the saved session automatically
Visits each profile, collects the last 5 post URLs
Visits each post and extracts: date, caption, likes, image URLs, hashtags, mentions, location, media type
Writes combined results to output.csv and output.md

Setup

Requires Python 3.11+ and uv.

git clone <repo-url>
cd instagram-scraper
uv sync
uv run playwright install chromium

Configuration

profiles.txt — one Instagram profile URL per line. Lines can optionally be prefixed with a number and tab (the scraper strips them):

https://www.instagram.com/username1/
https://www.instagram.com/username2/
https://www.instagram.com/username3/

Constants in scraper.py (edit directly):

Constant	Default	Description
`POSTS_PER_PROFILE`	`5`	How many posts to scrape per profile
`PROFILES_FILE`	`profiles.txt`	Input file path
`OUTPUT_CSV`	`output.csv`	CSV output path
`OUTPUT_MD`	`output.md`	Markdown output path

Usage

uv run python scraper.py

On first run, a Chromium window opens. Log in to Instagram, then press Enter in the terminal. The session is saved to auth_state.json and reused on future runs.

If Instagram logs you out, delete auth_state.json and run again.

Output

CSV (`output.csv`)

One row per post with these columns:

Column	Description
`profile`	Instagram username
`post_url`	Full URL of the post
`date`	ISO 8601 datetime (e.g. `2026-02-23T15:28:13.000Z`)
`caption`	Full post caption text
`likes`	Like count (as displayed)
`image_urls`	Comma-separated CDN image URLs
`hashtags`	Comma-separated hashtags from caption
`mentions`	Comma-separated @mentions from caption
`location`	Location tag text (empty if none)
`media_type`	`photo`, `video`, or `carousel`

Markdown (`output.md`)

Same data, grouped by profile. Each post is a section with all fields as a bullet list.

Error handling

Private or unavailable profiles: skipped with a warning, scraping continues
Individual post failures: skipped with a warning, scraping continues
Missing fields: stored as empty string, no crash
Keyboard interrupt (Ctrl+C): saves whatever has been collected so far

Files

instagram-scraper/
├── scraper.py        # main script
├── profiles.txt      # input: list of profile URLs
├── pyproject.toml    # project metadata and dependencies
├── tests/
│   └── test_parsers.py  # unit tests for parsing functions
├── auth_state.json   # saved session (created on first run, gitignored)
├── output.csv        # results (gitignored)
└── output.md         # results (gitignored)

Running tests

uv run pytest tests/ -v

Notes

Works with public profiles. Private profiles are skipped.
Instagram rate-limits aggressive scraping. The script adds a 1.5s wait between post requests.
Session cookies expire periodically. Delete auth_state.json to re-authenticate.
Image URLs are CDN URLs that expire after some time — download them promptly if needed.

3.6 KiB Raw Permalink Blame History