Duly Noted

Questions of AI authorship and ownership can be divided into two broad types. One concerns the vast troves of human-authored material fed into AI models as part of their “training” (the process by which their algorithms “learn” from data). The other concerns ownership of what AIs produce.
Fully aware that vast data scraping is legally untested—to say the least—developers charged ahead anyway, resigning themselves to litigating the issue in retrospect. Publisher Peter Schoppert has called the training of LLMs without permission the industry’s “original sin”—to be added, we might say, to the technology’s mind-boggling consumption of energy and water on an overheating planet.

By the end of the period we analyzed, in the financial dataset we estimate about 18% of the data was generated by LLM, around 24% in company press releases, up to 15% for young and small companies job postings, and 14% for international organizations.

Hard to say how accurate this is, as I don’t know that AI detection models are that accurate. But regardless of adoption rate, there is a surge of usage followed by plateauing reflecting not everyting can be solved by AI.

Ginsparg was frustrated because he couldn’t understand why implementing features that used to take him a day now took weeks. I challenged him on this, asking if there was any documentation for developers to onboard the new code base. Ginsparg responded, “I learned Fortran in the 1960s, and real programmers didn’t document,” which nearly sent me, a coder, into cardiac arrest.

Interview with the creator of arXiv which I’ve learned is pronouced ‘archive’. I’ve read so many good papers on this site and none are paywalled. This quote about Ginsparg happily wirting code without documentation especially pleased me. Why not? It works doesnt it?

Software gets more complicated. All of this complexity is there for a reason. But what happened to specializing? When a house is being built, tons of people are involved: architects, civil engineers, plumbers, electricians, bricklayers, interior designers, roofers, surveyors, pavers, you name it. You don’t expect a single person, or even a whole single company, to be able to do all of those.

I mean, this is the business of software engineering. Work gets compressed to whom ever can generate the most revenue per employee.

The core idea is to separate the process into distinct components: a Planner, an Evaluator, and an Executor. The Planner generates a plan based on the user’s query. The Evaluator validates the generated plan. The Executor only executes plans that have been validated, ensuring that only sound plans are carried out.

And I guess the human rubber stamps it? No where is mentioned controlling for mistakes.

_{Quote Citation: Cedric Chee, “The DNA of AI Agents: Common Patterns in Recent Design Principles”, Dec 24, 2024, https://cedricchee.com/blog/the-dna-of-ai-agents/}

Compound mistakes: an agent often needs to perform multiple steps to accomplish a task, and the overall accuracy decreases as the number of steps increases. If the model’s accuracy is 95% per step, over 10 steps, the accuracy will drop to 60%, and over 100 steps, the accuracy will be only 0.6%.

Herein lies the rub with agents. Once they tumble down a bad path, how can they recover? Reminds me of the rumor(?) that an AI Model, when instructed to last as long as possible in tetris merely paused the game.

Every time I’ve built or inherited a team, one trait rises above all others: ownership. Do you ship? Do you own it when it breaks? Do you make the system better for the next person?

Also buried as a footnote ‘business > tech’. Being great at problem solving means being great at whatever tools solve it the most efficiently.

_{Quote Citation: Ben Howdle, “How crawlers impact the operations of the Wikimedia projects”, 02 Apr 2025, https://benhowdle.im/principles-and-implementation}

we found out that at least 65% of this resource-consuming traffic we get for the website is coming from bots, a disproportionate amount given the overall pageviews from bots are about 35% of the total. This high usage is also causing constant disruption for our Site Reliability team, who has to block overwhelming traffic from such crawlers before it causes issues for our readers.

I do wonder about the future of internet. Crawling is nothing new, its how google built its original search engine. But if 65% of traffic is robots where are the humans?

Long-term maintainability: This is the most insidious impact radius because it has the longest feedback loop, these issues might only be caught weeks and months later. These are the types of cases where the code will work fine for now, but will be harder to change in the future. Unfortunately, it’s also the category where my 20+ years of programming experience mattered the most.

Article covering all the ways an expert engineering had to guide AI, out of pitfalls, away from landmines and most importantly towards a long-term sustainable architecture. If AI is truly a Jr Engineer, code quality will regress (GitClear research on AI depresses code quality)

Industry leaders don’t have a good track record of predicting AI developments. … As an example, Sutskever had an incentive to talk up scaling when he was at OpenAI and the company needed to raise money. But now that he heads the startup Safe Superintelligence, he needs to convince investors that it can compete with OpenAI, Anthropic, Google, and others, despite having access to much less capital. Perhaps that is why he is now talking about running out of data for pre-training, as if it were some epiphany and not an endlessly repeated point.

Copyright law, AI Prompting and output ownership

Business Adoption of LLM for writing

The programming history of arxiv

T skills for programming

AI Agents rely on expertise planners

How AI Agents might practically work

Insights from a CTO practicionare

Robots all the way down

Coding Expertise with Vibes

AI Hype machine