# leadgen — Complete Documentation

**Apollo-class local lead generation from Google Maps. No API. Streams leads
straight into Notion, Google Sheets, or CSV with live progress and guaranteed
zero duplicates.**

Built by Auradevs. One tool, four ways to run it, sellable on LemonSqueezy.

---

## Table of contents
1. [What it is & why it beats Apollo for local](#1-what-it-is)
2. [The four instances](#2-the-four-instances)
3. [Install (step by step)](#3-install)
4. [Connect a destination (Notion / Sheets / CSV)](#4-connectors)
5. [Usage — the ultimate step-by-step](#5-usage)
6. [Every command & flag](#6-cli-reference)
7. [The field set (Apollo-grade)](#7-fields)
8. [Run it from an AI agent (Claude / Copilot / Codex)](#8-ai-agents)
9. [Host it (runs with your computer off) + email](#9-hosting)
10. [Sell it on LemonSqueezy (licensing)](#10-selling)
11. [Edge cases handled](#11-edge-cases)
12. [Troubleshooting & FAQ](#12-troubleshooting)

---

## 1. What it is
You give it a **niche + city + how many leads**. It opens Google Maps in a real
(headless) browser, scrolls the results, opens each listing, and extracts a rich
field set — then pushes each lead **directly** into your database as it's found.

**Why it beats Apollo for local SMBs:** Apollo is built for corporate B2B and is
weak on local businesses (and gates emails/phones behind plans). leadgen pulls
**phone, website, socials, geo-coordinates, Google Place ID, rating, review
count + review text, opening hours, price level, photos/logo, and a computed
Lead Score** — data Apollo simply doesn't have for a neighbourhood café. It also
filters to businesses **without a website** (your prime web-design prospects),
which Apollo can't do at all.

---

## 2. The four instances
The same engine, packaged four ways (all from this one repo):

| # | Instance | How |
|---|----------|-----|
| 1 | **Executable** | `python -m leadgen_maps run …` — runs directly from the source tree. |
| 2 | **Installable package** | `pip install .` gives a global `leadgen` command (this is what a buyer installs). |
| 3 | **AI-agent + manual** | A skill file (`skills/leadgen/SKILL.md`) lets Claude/Copilot/Codex drive it with `--json`; humans use the same CLI. |
| 4 | **Terminal / hosted** | Runs in any terminal; Dockerfile + worker run it on a server. **Requires internet** — it exits with a clear error offline. |

---

## 3. Install
> **Bought the packaged binary and not a developer?** Skip this section — follow
> **[BUYER-GUIDE.md](BUYER-GUIDE.md)** instead (per-OS, no Python needed).

**Prerequisites:** Python 3.9+ and an internet connection.

```bash
# 1. get the code
git clone https://github.com/subhadeeproy3902/lead-gen.git
cd lead-gen

# 2. install the package (gives you the `leadgen` command)
pip install .
#    …or with Google Sheets support:
pip install ".[sheets]"

# 3. install the browser it drives (one time)
python -m playwright install chromium

# 4. configure secrets
cp .env.example .env        # then edit .env

# 5. verify everything
leadgen doctor
```

> **PATH note (Windows):** if `leadgen` isn't found after install, either add
> Python's `Scripts` folder to PATH, or just use `python -m leadgen_maps …` — it's
> identical and always works.

`leadgen doctor` should report `internet: OK`, your connector as configured, and
`Playwright: installed`. Fix anything it flags before running.

---

## 4. Connectors
Pick one or more with `--to` (comma list). leadgen creates the columns it needs
and **dedupes against the destination**, so duplicates never happen.

### Notion (recommended)
1. Create an integration at <https://www.notion.so/my-integrations> → copy the token.
2. Create (or pick) a database, open it, **••• → Connections → add your integration**.
3. Copy the database id (the 32-char chunk in its URL).
4. In `.env`: `NOTION_TOKEN=…` and `NOTION_DATABASE_ID=…`.

### Google Sheets
1. Google Cloud Console → create a **Service Account** + JSON key; enable the **Sheets API**.
2. Share your target Sheet with the service-account email (…`iam.gserviceaccount.com`) as **Editor**.
3. In `.env`: `GOOGLE_SERVICE_ACCOUNT_JSON=C:/path/key.json`, `GSHEET_ID=…`, `GSHEET_TAB=Leads`.
4. Install support: `pip install ".[sheets]"`.

### CSV
Nothing to configure. `--to csv` writes `leads.csv` (a clean table view) in the
current folder, appending + deduping across runs.

---

## 5. Usage
The ultimate step-by-step:

```bash
# Step 1 — sanity check
leadgen doctor

# Step 2 — run a job (niche + city + how many + where)
leadgen run --niche "cafe" --location "Kolkata" --limit 30 --to notion

# Step 3 — watch the live bar
#   [██████████░░░░] 21/30 leads (70%) │ #54 of 110 scanned │ elapsed 3m │ ETA 1m20s │ Blue Tokai

# Step 4 — read the summary
#   ✓ 30 new leads (target 30) → notion: https://www.notion.so/<db>
```

More examples:
```bash
# Businesses that DO have a website (e.g. redesign prospects)
leadgen run --niche "dentist" --location "Pune" --limit 40 --website with --to notion

# Any website status, into Sheets + CSV, Apollo column subset
leadgen run --niche "gym" --location "Dubai" --limit 50 --website any --to gsheets,csv --fields apollo --cc 971

# Big batch with an email when it finishes
leadgen run --niche "salon" --location "Kolkata" --limit 200 --to notion --email you@auradevs.co
```

**Duplicates:** before each run leadgen reads every existing lead's Place ID /
phone from the destination and skips them. Ask for "30 cafés in Kolkata" today
and "30 more" next week → no overlap, even if local files are gone.

**Manual edits:** you can rename/reorder/delete columns in Notion or your Sheet;
leadgen only fills the columns it recognises and leaves your custom ones alone.

---

## 6. CLI reference
```
leadgen run --niche <str> --location <str> [options]
  --limit N            new leads to collect, max 256 (default 30)
  --website MODE       without | with | any   (default without)
  --to LIST            notion,gsheets,csv      (default notion)
  --fields SPEC        default | all | apollo | <comma,list,of,keys>
  --cc CODE            country calling code; 91 ⇒ Region=India (default 91)
  --show               show the browser window (debug)
  --no-reviews         skip review snippets (faster)
  --email ADDR         email a summary when done (needs SMTP in .env)
  --json               stream JSON events (use this from agents/pipelines)

leadgen doctor         check internet, connectors, Playwright, license
leadgen fields         list the field set ( --json for machine form )
leadgen connectors     list push targets
leadgen --version
```

---

## 7. Fields
37 fields across Company, Web, Contact, Social, Location, Reputation, Media,
Firmographic, and Meta groups. Run `leadgen fields` for the live list. Source of
each value:
- **maps** — scraped from the listing (name, category, phone, website, socials, geo, place id, rating, reviews, hours, photos…)
- **derived** — computed (domain, address components, region, tags, lead score…)
- **enrich** — column exists but blank unless you add an enrichment provider (employees, revenue, technologies, founded year) — exactly how Apollo exports blank columns off-plan.

Choose columns with `--fields`: `default` (everything extractable), `all`
(includes enrich columns), `apollo` (mirrors Apollo's company export), or your
own comma list of keys.

---

## 8. AI agents
leadgen ships a skill at `skills/leadgen/SKILL.md`. For Claude Code, copy it to
`.claude/skills/leadgen/SKILL.md` (already installed in this project). Any agent
(Claude, Copilot, Codex) runs it the same way:

```bash
python -m leadgen_maps run --niche "cafe" --location "Kolkata" --limit 30 --to notion --json
```

With `--json`, every stdout line is an event: `start`, `queue`, `progress`
(`kept`, `target`, `inspected`, `queued`, `percent`, `elapsed_s`, `eta_s`,
`current`), `done`, and a final `summary`. The agent parses `summary` and reports
`kept` vs `target` and the destination link.

---

## 9. Hosting
Run it on a server so jobs finish even with your laptop off, and get an email
when done. Full guide in [`deploy/README.md`](deploy/README.md). In short:

```bash
docker build -t leadgen .
docker run --env-file .env \
  -e LEADGEN_JOBS='[{"niche":"cafe","location":"Kolkata","limit":30,"to":["notion"]}]' \
  leadgen python deploy/worker.py
```
Or deploy `deploy/render.yaml` as a Render Blueprint for a managed schedule.
Set `SMTP_*` + `NOTIFY_EMAIL` (Gmail: use an App Password) to receive summaries.

---

## 10. Distribution
You sell the package on LemonSqueezy (or anywhere) as a normal downloadable
tool — no licensing code is baked in, so there's nothing to phone home and
nothing to break. Each buyer installs their own copy and runs it with their own
connector credentials. You're selling a tool, not running a platform: no server
to keep up.

- **Python**: ship the repo / a wheel (`pip install .`).
- **Go**: ship the single binary (`leadgen` for their OS) or the one-line
  install script — buyers don't even need Python installed.

See **[DISTRIBUTION.md](DISTRIBUTION.md)** for every install channel (release
binary, `go install`, Homebrew, Scoop, winget) and the public-vs-private caveat,
and **[BUYER-GUIDE.md](BUYER-GUIDE.md)** for the dead-simple per-OS buyer setup.

---

## 11. Edge cases handled
- **Offline** → exits immediately with a clear, actionable error.
- **Interrupted mid-run** → every lead is pushed the instant it's found, so
  partial runs keep everything collected so far.
- **Re-running** → dedupes against the destination by Place ID → phone →
  name+address. No duplicates, ever.
- **Website-heavy niche** → reports the true count honestly (e.g. only 7 of 110
  interior designers had no website) instead of padding.
- **Cookie/consent walls** → auto-dismissed.
- **Rate limiting** → randomised human-like delays between listings.
- **Missing data** (no phone/photo/website) → handled per field; blank, not crash.
- **Bengali / non-Latin names** → UTF-8 throughout.
- **Notion rejects an external photo** → page is still created with everything else.
- **Max 256 / Maps' ~120 cap** → collects up to your limit or whatever Maps
  returns, whichever is smaller, and tells you which.

---

## 12. Troubleshooting
| Symptom | Fix |
|--------|------|
| `leadgen: command not found` | Use `python -m leadgen_maps …`, or add Python's `Scripts` dir to PATH. |
| `No internet connection detected` | Reconnect; the tool can't run offline. |
| `Playwright … MISSING` | `python -m playwright install chromium`. |
| `Notion connector needs NOTION_TOKEN…` | Fill `.env`; confirm the DB is shared with your integration. |
| Google Sheets error | `pip install ".[sheets]"`; share the sheet with the service-account email. |
| Fewer leads than `--limit` | The niche is website-heavy or Maps returned fewer listings — expected, not a bug. |
| Browser visible / slow | It's headless by default; add `--no-reviews` to go faster. |

**FAQ**
- *Does it use the Google Maps API?* No — pure browser automation, no API key, no per-call cost.
- *Will I get duplicates?* No — dedupe is enforced against the destination.
- *Can I edit the columns?* Yes — pick with `--fields`, and edit your Notion/Sheet columns freely.
- *Is scraping allowed?* You're responsible for complying with Google's terms and local law; keep volumes reasonable.
