Private
Public Access
1
0
Files
insta_scraper/README.md

3.6 KiB

Instagram Profile Scraper

Scrapes the last N posts from a list of Instagram profiles and saves the results as CSV and Markdown. Uses Playwright with a real authenticated browser session — no API keys required.

What it does

  1. Reads a list of Instagram profile URLs from profiles.txt
  2. Opens a Chromium browser (visible, non-headless)
  3. On first run: prompts you to log in manually, then saves the session to auth_state.json
  4. On subsequent runs: reuses the saved session automatically
  5. Visits each profile, collects the last 5 post URLs
  6. Visits each post and extracts: date, caption, likes, image URLs, hashtags, mentions, location, media type
  7. Writes combined results to output.csv and output.md

Setup

Requires Python 3.11+ and uv.

git clone <repo-url>
cd instagram-scraper
uv sync
uv run playwright install chromium

Configuration

profiles.txt — one Instagram profile URL per line. Lines can optionally be prefixed with a number and tab (the scraper strips them):

https://www.instagram.com/username1/
https://www.instagram.com/username2/
https://www.instagram.com/username3/

Constants in scraper.py (edit directly):

Constant Default Description
POSTS_PER_PROFILE 5 How many posts to scrape per profile
PROFILES_FILE profiles.txt Input file path
OUTPUT_CSV output.csv CSV output path
OUTPUT_MD output.md Markdown output path

Usage

uv run python scraper.py

On first run, a Chromium window opens. Log in to Instagram, then press Enter in the terminal. The session is saved to auth_state.json and reused on future runs.

If Instagram logs you out, delete auth_state.json and run again.

Output

CSV (output.csv)

One row per post with these columns:

Column Description
profile Instagram username
post_url Full URL of the post
date ISO 8601 datetime (e.g. 2026-02-23T15:28:13.000Z)
caption Full post caption text
likes Like count (as displayed)
image_urls Comma-separated CDN image URLs
hashtags Comma-separated hashtags from caption
mentions Comma-separated @mentions from caption
location Location tag text (empty if none)
media_type photo, video, or carousel

Markdown (output.md)

Same data, grouped by profile. Each post is a section with all fields as a bullet list.

Error handling

  • Private or unavailable profiles: skipped with a warning, scraping continues
  • Individual post failures: skipped with a warning, scraping continues
  • Missing fields: stored as empty string, no crash
  • Keyboard interrupt (Ctrl+C): saves whatever has been collected so far

Files

instagram-scraper/
├── scraper.py        # main script
├── profiles.txt      # input: list of profile URLs
├── pyproject.toml    # project metadata and dependencies
├── tests/
│   └── test_parsers.py  # unit tests for parsing functions
├── auth_state.json   # saved session (created on first run, gitignored)
├── output.csv        # results (gitignored)
└── output.md         # results (gitignored)

Running tests

uv run pytest tests/ -v

Notes

  • Works with public profiles. Private profiles are skipped.
  • Instagram rate-limits aggressive scraping. The script adds a 1.5s wait between post requests.
  • Session cookies expire periodically. Delete auth_state.json to re-authenticate.
  • Image URLs are CDN URLs that expire after some time — download them promptly if needed.