# Instagram Profile Scraper Scrapes the last N posts from a list of Instagram profiles and saves the results as CSV and Markdown. Uses Playwright with a real authenticated browser session — no API keys required. ## What it does 1. Reads a list of Instagram profile URLs from `profiles.txt` 2. Opens a Chromium browser (visible, non-headless) 3. On first run: prompts you to log in manually, then saves the session to `auth_state.json` 4. On subsequent runs: reuses the saved session automatically 5. Visits each profile, collects the last 5 post URLs 6. Visits each post and extracts: date, caption, likes, image URLs, hashtags, mentions, location, media type 7. Writes combined results to `output.csv` and `output.md` ## Setup Requires Python 3.11+ and [uv](https://docs.astral.sh/uv/). ```bash git clone cd instagram-scraper uv sync uv run playwright install chromium ``` ## Configuration **`profiles.txt`** — one Instagram profile URL per line. Lines can optionally be prefixed with a number and tab (the scraper strips them): ``` https://www.instagram.com/username1/ https://www.instagram.com/username2/ https://www.instagram.com/username3/ ``` **Constants in `scraper.py`** (edit directly): | Constant | Default | Description | |---|---|---| | `POSTS_PER_PROFILE` | `5` | How many posts to scrape per profile | | `PROFILES_FILE` | `profiles.txt` | Input file path | | `OUTPUT_CSV` | `output.csv` | CSV output path | | `OUTPUT_MD` | `output.md` | Markdown output path | ## Usage ```bash uv run python scraper.py ``` On first run, a Chromium window opens. Log in to Instagram, then press Enter in the terminal. The session is saved to `auth_state.json` and reused on future runs. If Instagram logs you out, delete `auth_state.json` and run again. ## Output ### CSV (`output.csv`) One row per post with these columns: | Column | Description | |---|---| | `profile` | Instagram username | | `post_url` | Full URL of the post | | `date` | ISO 8601 datetime (e.g. `2026-02-23T15:28:13.000Z`) | | `caption` | Full post caption text | | `likes` | Like count (as displayed) | | `image_urls` | Comma-separated CDN image URLs | | `hashtags` | Comma-separated hashtags from caption | | `mentions` | Comma-separated @mentions from caption | | `location` | Location tag text (empty if none) | | `media_type` | `photo`, `video`, or `carousel` | ### Markdown (`output.md`) Same data, grouped by profile. Each post is a section with all fields as a bullet list. ## Error handling - Private or unavailable profiles: skipped with a warning, scraping continues - Individual post failures: skipped with a warning, scraping continues - Missing fields: stored as empty string, no crash - Keyboard interrupt (Ctrl+C): saves whatever has been collected so far ## Files ``` instagram-scraper/ ├── scraper.py # main script ├── profiles.txt # input: list of profile URLs ├── pyproject.toml # project metadata and dependencies ├── tests/ │ └── test_parsers.py # unit tests for parsing functions ├── auth_state.json # saved session (created on first run, gitignored) ├── output.csv # results (gitignored) └── output.md # results (gitignored) ``` ## Running tests ```bash uv run pytest tests/ -v ``` ## Notes - Works with public profiles. Private profiles are skipped. - Instagram rate-limits aggressive scraping. The script adds a 1.5s wait between post requests. - Session cookies expire periodically. Delete `auth_state.json` to re-authenticate. - Image URLs are CDN URLs that expire after some time — download them promptly if needed.