Skip to main content

The Co-Creator of Django on Data Journalism and Building Datasette

[HPP] Simon WillisonDecember 8, 20251h 1min
29 connections·40 entities in this video→

The Genesis of Data Journalism

  • πŸ’‘ Simon Willison, co-creator of Django, began his career at a local Kansas newspaper, where the framework was initially developed to manage local data like events and venues.
  • 🎯 His passion for data journalism involves using data and databases to uncover and tell stories, emphasizing its role in building credibility through searchable databases.
  • πŸ‘ Noteworthy examples include the Washington Post's opioid data project and the rise of non-profit newsrooms like ProPublica and the Baltimore Banner, fostering collaborative data efforts.

Datasette: From Sharing to Exploration

  • πŸš€ Datasette originated from the need to publish read-only SQLite databases on serverless hosting, addressing the limitations of tools like Google Sheets for data sharing.
  • 🧩 Its plugin system transformed it into a versatile multi-tool, enabling data exploration, visualization, and cleaning beyond its initial distribution purpose.
  • ⚠️ Unexpected applications include managing electricity grid information, historical research (Brooklyn cemetery), and critical open-source intelligence work by Bell & Cat on leaked data.

Overcoming Data Engineering Hurdles

  • πŸ› οΈ A significant challenge in data engineering is the lack of version control (e.g., Git) for scripts and notebooks, leading to poor repeatability and documentation.
  • 🧠 Data cleaning consumes a vast majority of data professionals' time, highlighting a need for more efficient and verifiable workflows.
  • 🚨 Poor data documentation and a lack of "view source" for reports can lead to critical errors, emphasizing the need for clear communication of data models and queries.

Cultivating a Strong Data Culture

  • βœ… Effective data teams prioritize documentation, distinguishing between official, trustworthy docs and temporal, low-commitment internal blogs or "Today I Learned" (TIL) entries.
  • πŸ’¬ Encouraging a culture of sharing knowledge through internal platforms helps build credibility and visibility for data teams.
  • πŸ”‘ Making SQL queries first-class citizens with version control, ownership, and comment capabilities can foster better collaboration and understanding within teams.

The AI-Driven Data Future

  • πŸ€– LLM agents like Claude Code, operating in a loop with a terminal, are revolutionizing data cleaning and manipulation by automating tasks through trial and error.
  • πŸ“Š The future of BI dashboards is seen as prompt-driven, allowing LLMs to quickly generate and customize visualizations based on user requests.
  • πŸ”¬ Key LLM applications in data analysis include efficient text-to-SQL generation, highly accurate data extraction from unstructured documents, and large-scale data enrichment for augmenting datasets.
Knowledge graph40 entities Β· 29 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover Β· drag to explore
40 entities
Chapters20 moments

Key Moments

Transcript227 segments

Full Transcript

Topics15 themes

What’s Discussed

DjangoData JournalismDatasetteOpen SourceData EngineeringData DocumentationData CleaningLLM AgentsClaude CodeBI DashboardsText to SQLData ExtractionData EnrichmentSQLiteVersion Control
Smart Objects40 Β· 29 links
PeopleΒ· 3
ProductsΒ· 5
MediasΒ· 3
ConceptsΒ· 10
CompaniesΒ· 18
EventΒ· 1