Listen to this Post

Introduction:
Every `git commit` tells a story—not just about code, but about the person who wrote it. Public GitHub repositories, home to over 180 million developers, are a treasure trove of forgotten metadata: personal email addresses, corporate aliases, usernames, and even GPG key fingerprints. While developers meticulously sanitize their public profiles, they often overlook the fact that every commit carries an author and committer identity—and when these diverge, they create a correlation point that OSINT tools like Gitcolombo exploit to link seemingly unrelated accounts. This article explores how threat actors, red teams, and security researchers extract real identities from Git history, and provides a hands-on guide to using these techniques for both offensive reconnaissance and defensive exposure auditing.
Learning Objectives:
- Identify and extract hidden personally identifiable information (PII) and contact data from public Git commit logs.
- Leverage automated OSINT tools to map contributor networks, codebase relationships, and identity correlations across GitHub organizations.
- Understand how to audit your organization’s public GitHub exposure and implement mitigations against identity leakage.
You Should Know:
- The Author–Committer Mismatch: Why Git History Leaks More Than You Think
Git distinguishes between the author (who wrote the code) and the committer (who applied it to the repository). This distinction is a common source of identity leakage. When a developer commits code using their work email, then runs `git commit –amend` to change to a personal account, they often forget that this rewrites the committer but leaves the original author intact. The result? Two distinct identities attached to the same commit, creating a bridge that OSINT tools can cross.
Gitcolombo, an open-source Python CLI tool developed by soxoj, automates the extraction and correlation of these identities. It clones repositories, parses `git log` output, and optionally queries the GitHub API to enrich signals. The tool outputs per-person details including names, emails, author/committer counts, and crucially, other identities that may belong to the same person—as well as different names tied to the same email.
Step‑by‑step: Installing and Running Gitcolombo
Prerequisites: Python 3.10+, Git binary, and pip.
Install via pip pip install gitcolombo Or install from source git clone https://github.com/soxoj/gitcolombo cd gitcolombo pip install -e .
Basic Usage Examples:
Scan a single repository from URL gitcolombo -u https://github.com/username/repository Scan a local directory recursively gitcolombo -d ./target_repo -r Scan all public repos of a GitHub user or organization gitcolombo --1ickname octocat API-only: find emails for a username without cloning gitcolombo --search Soxoj
Remote repositories are cloned into `./repos/` by default; override with --repos-dir. For batch cloning from GitLab and Bitbucket groups, consider using ghorg.
- Manual Git Forensics: Extracting Emails and Identities Without Third-Party Tools
Before diving into automated tools, it is essential to understand the underlying Git commands that power them. These manual techniques are invaluable for quick reconnaissance and for validating automated outputs.
Linux/macOS Commands:
Clone the target repository git clone https://github.com/target_org/target_repo.git cd target_repo Extract unique author emails git log --format='%ae' | sort -u > extracted_emails.txt Extract names and emails together for correlation git log --format='%an <%ae>' | sort -u > author_contacts.txt Extract committer emails (often different from author) git log --format='%ce' | sort -u > committer_emails.txt Full commit metadata with timestamps git log --format='%an <%ae> %cd' --date=short > full_audit.txt
Windows PowerShell Approach:
Clone repository git clone https://github.com/target_org/target_repo.git cd target_repo Extract unique author emails git log --format='%ae' | Sort-Object -Unique | Out-File extracted_emails.txt Extract names and emails git log --format='%an <%ae>' | Sort-Object -Unique | Out-File author_contacts.txt
These commands surface every email address that has ever been associated with a commit in the repository’s history—including addresses that may have been changed or removed from the public profile later.
- Automated Correlation: How Gitcolombo Maps Identities Across Repositories
Gitcolombo goes beyond simple extraction. It correlates identities by analyzing:
- Emails that share a common name – suggesting the same person uses multiple aliases.
- Different names tied to the same email – indicating a single account used by multiple individuals or a person with multiple pseudonyms.
- Author/committer mismatches – where one person authored a commit but another applied it, creating a link between two accounts.
To generate a structured report:
gitcolombo -u https://github.com/target_org/target_repo.git --output report.json
The JSON output provides a machine-readable format suitable for further analysis, integration with threat intelligence platforms, or feeding into custom dashboards.
4. Web-Based OSINT: The No-Install Gitcolombo HTML Interface
For analysts who cannot install Python packages or prefer a browser-based approach, Gitcolombo offers a single static HTML file (gitcolombo.html) that queries the GitHub API directly. A hosted version is available at https://gitcolombo.soxoj.com. This web interface allows for rapid, zero-footprint reconnaissance—ideal for quick lookups or situations where installing tools is not feasible.
5. Defensive Measures: Hardening Your Organization’s Git Exposure
From a blue-team perspective, the existence of these tools underscores the need for rigorous identity hygiene. Recommendations include:
- Use GitHub’s private email feature: Enable the “Keep my email address private” setting and use the provided `noreply` email address for commits.
- Audit commit history: Regularly scan your public repositories using tools like Gitcolombo to identify unintentionally exposed corporate emails or personal aliases.
- Squash and rebase with care: When amending commits, ensure both author and committer fields are updated consistently.
- Leverage `.mailmap` files: Git supports a `.mailmap` file to canonicalize author and committer names and emails across commits, helping to unify identities and reduce leakage.
- Implement pre-commit hooks: Use client-side Git hooks to validate that commit emails match approved corporate domains before pushing to public remotes.
- Advanced OSINT Workflows: Combining Gitcolombo with Other Intelligence Tools
Gitcolombo is most powerful when integrated into a broader OSINT pipeline. For example:
- Feed extracted emails into Hunter.io or Phonebook.cz to discover associated domains and subdomains.
- Cross-reference usernames with Maigret (another tool by the same author) to map GitHub identities to social media profiles.
- Use extracted corporate email patterns to build targeted phishing simulations or to validate the attack surface of an organization.
A sample pipeline in Python:
import subprocess
import json
Run gitcolombo and capture output
result = subprocess.run(
['gitcolombo', '-u', 'https://github.com/target_org/repo', '--output', 'report.json'],
capture_output=True, text=True
)
with open('report.json', 'r') as f:
data = json.load(f)
for person in data['persons']:
print(f"Name: {person['name']}, Emails: {person['emails']}")
What Undercode Say:
- Key Takeaway 1: Git history is a persistent, often overlooked source of PII. Every commit is a data point that can be correlated to build a detailed identity graph, regardless of current privacy settings.
- Key Takeaway 2: Automated OSINT tools like Gitcolombo lower the barrier to entry for identity reconnaissance, making it accessible to both security professionals and malicious actors. Defensive teams must adopt the same tools to understand their own exposure.
Analysis: The intersection of version control and OSINT represents a paradigm shift in how we think about public data. What was once considered benign metadata—commit authors, timestamps, and email addresses—is now a critical intelligence vector. The sheer scale of GitHub (180 million+ developers) means that the attack surface is vast and largely unmanaged. Organizations often focus on code security (e.g., preventing secret leaks) but neglect identity exposure. Yet, as this article demonstrates, a single corporate email address extracted from a five-year-old commit can be the entry point for a social engineering attack, a credential stuffing campaign, or a targeted phishing operation. The defensive posture must evolve from “protect the code” to “protect the metadata”—and that requires continuous monitoring, automated scanning, and a culture of identity hygiene among developers.
Prediction:
- +1 The growing awareness of Git-based OSINT will drive the development of new defensive tools and services, creating a niche market for GitHub exposure auditing and automated identity sanitization.
- -1 As threat actors increasingly incorporate Git history analysis into their reconnaissance phase, we will see a rise in sophisticated social engineering attacks that leverage extracted corporate emails and personal aliases to bypass traditional security controls.
- -1 The legal and regulatory landscape will struggle to keep pace, as the extraction of publicly available commit metadata occupies a gray area between open-source intelligence and privacy violation, leading to contentious debates and potential lawsuits.
- +1 Open-source projects will adopt `.mailmap` and other canonicalization techniques more widely, reducing the efficacy of automated correlation tools and forcing OSINT practitioners to develop more sophisticated heuristics.
- -1 Despite defensive improvements, the sheer volume of historical commit data means that legacy exposure will remain exploitable for years, creating a persistent risk for organizations that have not conducted thorough historical audits.
- +1 Integration of Gitcolombo-style analysis into red-team engagements will become standard practice, improving the realism and effectiveness of penetration tests and ultimately strengthening organizational security postures.
- -1 The democratization of these OSINT techniques means that even low-skill attackers can now perform sophisticated identity reconnaissance, lowering the barrier to entry for targeted attacks against developers and open-source maintainers.
▶️ Related Video (74% Match):
🎯Let’s Practice For Free:
🎓 Live Courses & Certifications:
Join Undercode Academy for Verified Certifications
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
IT/Security Reporter URL:
Reported By: Mariosantella Osint – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅


