Listen to this Post

Introduction:
Open-source intelligence (OSINT) investigations often drown in fragmented data – scattered emails, names, domains, and coordinates hidden across dozens of visited web pages. Ubikron, a free self‑hosted tool described by investigators, solves this by automatically extracting entities from saved pages and visualizing them as interactive graphs. This article dives into the tool’s graph‑based workflow, replicates its core functionality using open‑source alternatives, and provides step‑by‑step commands for Linux and Windows to supercharge your own OSINT pipeline.
Learning Objectives:
- Understand how automated entity extraction from web pages accelerates link analysis and investigation backtracking.
- Build a local, privacy‑friendly graph database using Python, Neo4j, and browser automation.
- Apply data enrichment and report generation to connect disparate clues from social media and public sources.
You Should Know:
- Extracting Entities from Web Pages – Command‑Line & Browser Automation
The post highlights Ubikron’s ability to save any browsed page and instantly extract emails, names, phone numbers, URLs, and geocoordinates. This is achieved through regex patterns and HTML parsing. Below is a Python script that does the same – run it on Linux or Windows after installing dependencies.
Step‑by‑step: Build your own entity extractor
- Linux/macOS/Windows – Install Python and required libraries:
pip install requests beautifulsoup4 re ipwhois folium
- Save a page locally (e.g.,
target.html) or fetch via URL:import requests from bs4 import BeautifulSoup import re</li> </ul> url = "https://example.com/news" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') text = soup.get_text()
– Extract emails:
emails = set(re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}', text)) print("Emails:", emails)– Extract domains from `` tags:
domains = set() for a in soup.find_all('a', href=True): if 'http' in a['href']: domain = a['href'].split('/')[bash] domains.add(domain)– Windows alternative: Use PowerGREP or a simple PowerShell regex:
(Get-Content page.html -Raw) | Select-String -Pattern '\b[\w.-]+@[\w.-]+.\w{2,}\b' -AllMatchesHow to use: Save HTML files from your browser (Ctrl+S), then run the script to output a CSV of entities. For live browsing, use a browser extension like “SingleFile” to archive pages and a local server to process them.
- Building the Graph – Connecting Entities with Neo4j (Self‑Hosted)
Ubikron’s core innovation is the graph where nodes (emails, names, URLs) link to visited pages. You can replicate this using Neo4j, a free graph database. The steps below assume Ubuntu 22.04, but Windows versions exist.
Step‑by‑step: Local graph database for investigations
- Install Neo4j (Linux):
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add - echo 'deb https://debian.neo4j.com stable 4.4' | sudo tee /etc/apt/sources.list.d/neo4j.list sudo apt update && sudo apt install neo4j sudo systemctl start neo4j
- On Windows: Download Neo4j Desktop from neo4j.com, install, create a local database.
- Connect via Cypher: Use Python’s `neo4j` driver. After entity extraction, add nodes and relationships:
from neo4j import GraphDatabase</li> </ul> uri = "bolt://localhost:7687" driver = GraphDatabase.driver(uri, auth=("neo4j", "password")) def add_page(tx, url, title): tx.run("MERGE (p:Page {url: $url, title: $title})", url=url, title=title) def add_entity(tx, entity_type, value): tx.run("MERGE (e:"+entity_type+" {value: $value})", value=value) def relate(tx, url, entity_value): tx.run("MATCH (p:Page {url: $url}), (e {value: $entity}) " "MERGE (p)-[:CONTAINS]->(e)", url=url, entity=entity_value)– To show all pages where an entity appears (as mentioned in the post):
MATCH (e {value: '[email protected]'})<-[:CONTAINS]-(p:Page) RETURN p.urlWhat this does: Every visited page becomes a node, every extracted entity becomes another node, and edges show mentions. You can instantly backtrack to the source page – exactly the “stage of investigation” feature.
- Enrichments & OSINT Automation – Over 100 Data Points
The post mentions “over 100 data enrichments.” These include reverse WHOIS, geolocation of IPs, social media profiling, and hashtag tracking. Here’s how to enrich a domain name using free APIs.
Step‑by‑step: Enrich domains with IP geolocation and threat intel
– Linux/Windows command line for WHOIS & IP:whois example.com | grep -E "Registrant|Creation Date" nslookup example.com
– Python enrichment script (install
requests,ipwhois):import socket from ipwhois import IPWhois domain = "example.com" ip = socket.gethostbyname(domain) obj = IPWhois(ip) results = obj.lookup_rdap(depth=1) print(f"ASN: {results['asn']}, Country: {results['asn_country_code']}")– Add coordinates to graph: Create a `Location` node with lat/lon, then link the IP or domain to it. For phone numbers, use `phonenumbers` library to validate and derive country.
– API security tip: When using public enrichment APIs (e.g., VirusTotal, Shodan), never hardcode API keys. Use environment variables:export VT_API_KEY="your_key"
In Python: `import os; key = os.getenv(“VT_API_KEY”)`
- Self‑Hosted Privacy & Data Control – Disable Saving Any Time
Ubikron offers a self‑hosted version and toggleable page saving. Implement this by using a local proxy or browser automation with granular controls.
Step‑by‑step: Privacy‑aware browsing pipeline with Playwright
- Install Playwright (Linux/Windows):
pip install playwright playwright install
- Write a script that saves pages only when a flag is enabled:
from playwright.sync_api import sync_playwright</li> </ul> SAVE_ENABLED = True Toggle this variable with sync_playwright() as p: browser = p.chromium.launch(headless=False) page = browser.new_page() page.goto("https://linkedin.com/in/example") if SAVE_ENABLED: content = page.content() with open("saved_page.html", "w") as f: f.write(content) browser.close()– For Windows, create a batch toggle:
@echo off set SAVE=1 if "%SAVE%"=="1" python save_page.py
– Hardening: Run the extraction and graph database inside a Docker container to isolate from your main OS. Sample
Dockerfile:FROM python:3.10 RUN pip install beautifulsoup4 requests neo4j COPY extractor.py /app/ WORKDIR /app CMD ["python", "extractor.py"]
- Integrating Images, Reports & Beta Graphs – Full Investigation Suite
The tool also allows adding images (clippings) and writing reports. You can simulate this by generating a Markdown report that embeds graph screenshots and entity lists.
Step‑by‑step: Generate an investigation report from your graph
- Export graph visualization using Neo4j Browser’s PNG export or use `pyvis` to create an interactive HTML report:
from pyvis.network import Network net = Network() net.add_node(1, label="Page: article.html") net.add_node(2, label="Email: [email protected]") net.add_edge(1, 2) net.show("investigation.html")
- To add images, store local file paths as node properties:
CREATE (c:Clipping {image_path: '/screenshots/post.png', description: 'LinkedIn post'}) - Write a report using Python’s `reportlab` (PDF) or simply output a structured JSON:
import json report = {"entities": list(emails), "graph_nodes": 42, "pages_visited": ["url1","url2"]} with open("osint_report.json", "w") as f: json.dump(report, f, indent=2)
- Mitigating Risks – When Graph OSINT Goes Wrong
Graph‑based investigations can accidentally expose sensitive data or violate terms of service. Always respect robots.txt, avoid aggressive crawling, and use self‑hosted tools to keep data local.
Step‑by‑step: Ethical scraping and cloud hardening
- Check robots.txt before any automation:
curl https://example.com/robots.txt
- Rate limiting to avoid IP blocks (Linux `cron` or Windows Task Scheduler):
import time time.sleep(2) 2 seconds between requests
- Cloud hardening for self‑hosted Neo4j: Use firewall rules (UFW on Linux, Windows Defender Firewall) to allow only localhost connections unless you expose via VPN.
sudo ufw allow from 192.168.1.0/24 to any port 7687
- If you must deploy in the cloud (AWS, Azure):
- Store credentials in AWS Secrets Manager or Azure Key Vault.
- Enable VPC and restrict inbound traffic to your IP.
- Use TLS for Neo4j (enable
dbms.connector.bolt.tls_level=REQUIRED).
- OSINT Workflow Example – From LinkedIn Post to Graph
The post’s screenshot includes names like Alex Lozano, Mario Santella, Dan Ramey. Here’s how you’d investigate them using Ubikron‑style graph.
Step‑by‑step: Real‑world mini‑investigation
- Collect pages: Save LinkedIn profiles, Twitter posts, and news articles mentioning those names.
- Run entity extraction (script from section 1) across all saved HTML files. Output might be:
`[“[email protected]”, “twitter.com/mariosantella”, “+1-555-1234”]`
- Import into Neo4j (section 2). Query to find connections:
MATCH (p:Person {name: 'Alex Lozano'})--(page:Page)--(other:Person) RETURN other.name, page.url - If a phone number appears on two different pages that both mention Dan Ramey and a specific geocoordinate, you have a strong link.
- Add enrichment (section 3) – resolve domain `lozano.com` to IP, geolocate, check VirusTotal reports.
What Undercode Say:
- Key Takeaway 1: Ubikron’s graph approach transforms chaotic OSINT data into an explorable structure, enabling rapid backtracking and hidden connection discovery – a must‑have for investigators.
- Key Takeaway 2: Self‑hosting and toggleable saving are critical for privacy and legal compliance; replicating this with open‑source tools (Python + Neo4j) is entirely feasible for under 100 lines of code.
- Analysis: The tool’s beta graphs represent a shift from siloed data collection to relational intelligence. However, investigators must implement rate limiting and data minimization to avoid crossing ethical boundaries. The inclusion of image clippings and reports suggests a trend toward all‑in‑one OSINT workbenches, reducing reliance on disconnected tools like Maltego and browser extensions.
Prediction:
Within 18 months, AI‑powered entity extraction and graph recommendation engines will become standard in free OSINT tools. Ubikron’s approach will likely inspire forks that integrate large language models to auto‑tag entities (e.g., “this phone number belongs to a known scam pattern”) and even hypothesize links before the investigator finds them. Commercial vendors will push cloud‑based graphs, but the self‑hosted, privacy‑first movement will grow – especially after high‑profile leaks from centralized OSINT platforms. Expect law enforcement and corporate security teams to adopt internal graph solutions based on this exact browser‑to‑graph pipeline.
▶️ Related Video (66% Match):
🎯Let’s Practice For Free:
IT/Security Reporter URL:
Reported By: Logan Woodward – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]
📢 Follow UndercodeTesting & Stay Tuned:


