티스토리 수익 글 보기
Go To Hellman
This machine surrounds hate and forces it to surrender.
Thursday, January 1, 2026
New Job: Project Gutenberg
Personal Note, January 1 2026: I have a new job: Executive Director of the Project Gutenberg Literary Archive Foundation. Here’s what I wrote for PG’s January Newsletter.
Greetings from the new Executive Director
Happy Public Domain Day! You might hear people say that books published in 1930 have “fallen” into the US Public Domain, or, that they have lost copyright “protection”. This is not quite correct. Rather, books published in 1930 have been FREED of copyright restrictions. They have ASCENDED into the public domain and into the embrace of organizations like Project Gutenberg. They now belong to ALL of us, and we need to take care of them for future generations.
On October 21, Project Gutenberg lost its longtime leader, Greg Newby, to pancreatic cancer. I had agreed to step up as Acting Executive Director so that Project Gutenberg could continue the mission that had become Greg’s life work: to serve and preserve public domain books so that all of us can use and enjoy them without restrictions. Although I’ve been doing development work for Project Gutenberg for the past 8 years, I did not really understand what Greg’s job entailed, or how many tasks he had been juggling. Three months in, I’m still discovering mysterious-to-me aspects of the organization. I’ve also been amazed at the dedication and talent of the many volunteers behind Project Gutenberg and our sister organization, Distributed Proofreaders. And at the large number of donors who make the organization financially viable and sustainable. So as of 2026, with your support, I’m continuing as Executive Director.
In the past three months Project Gutenberg has proven to be resilient; we took a heavy blow and managed to keep going. My top priority going forward is to make Project Gutenberg even more sustainable as well as resilient. In other words, my job is be one runner in a relay race: take the baton and make sure I get it to the next runner. That’s what we all have to do with public domain books, too. We want them to still be there in 50 years! Whether you’re already a volunteer or booster, an avid reader, or just someone curious about what we do, I hope you’ll help us pass that baton.
Tuesday, April 22, 2025
Boston Marathon Strava-verse: Paul Revere’s ride
In seventh grade, Miss Phillips had me memorize “Paul Revere’s Ride” by Henry Wadsworth Longfellow. So I did. After finishing “Jabberwocky” to start off the year of run naming, it seemed obvious what my next effort would be. I calculated that I could arrange to end it on the day of the Boston Marathon, thus neatly tying the verse with the running. And to top it off, the “18th of April” cited in the poem was exactly 250 years ago on Friday.
| “Paul Revere’s Ride” was first published in The Atlantic Monthly in 1861. |
of the midnight ride of Paul Revereand of my runs like this one here.Hardly a man is now alive who remembers that famous day and year,or when the end of this poem shall arrive.By land or sea from the town to-night”They be lost in New Jersey, no turnpike in sight.of the North Church tower as a signal light,—One, if by land, and two, if by sea;But if it be ‘puter, then ye shall put three.ready to ride and spread the alarmthe royalists are coming and they mean to do harm!For the country folk to be up and to arm.On up the Park Street and down by the pondto Chester, where merriment and good folk were found.Silently rowed to the Charlestown shoresafe from the royalists and and their childish roar.Where swinging wide at her moorings layAn emperor who would have his way… with each mast and sparacross the moon like a prison bar,that traitorous rogue will go too far.by its own reflection in the tide,For the pacer and the patriot,there’s no place left to hide.Wanders and watches with eager ears,wondering what we can do in these years.the muster of men at the barrack door,while the good folk of the country wish back on before.Forgetting their watches, they ran point eight four.and the measured tread of the grenadiers,…already tired of the next few years.By the wooden stairs, with stealthy tread,the view kept coming, no need to search“Resist! Resist!” he angrily saidAnd startled the pigeons from their perch.and moving shapes of shade,seen through hay-air glasses.to the highest window in the wallfor in the coming fateful brawl,he will see the mighty fall.A moment on the roofs of the townthe sun would soon rise and the breads would be round.Crescent and full, they’re having a ball.in their night-encampment on the hillWarning lights blaring red,this couldn’t be a drill.that he could hear,like a sentinel’s tread,the muskrat’s sneeras he left them for dead.creeping along from tent to tent,And seeming to whisper, “All is well!”,but veterans all, ’twas bad news to tell.of the place and the hour,A code for unlocking the library’s power.Sixteen falcons thundering overheadPut in the water, a drone menace on the quay.three walkers ramble in a state of dismay.like a bridge of boats coming to destroy, despite our votes.with a heavy stride he knew that those soldiers were on the wrong side.Now he patted his horse’s side,no yielding today,he was wholly without fear.Then, impetuous, stamped the earth,Hoping present horrors would give way to rebirth.But mostly he watched with eager searchFive or six hundred? He wondered the worth.above the graves on the hill,his fear for his country grows and grows.And lo! as he looks, on the belfry’s height,a somber thought. might is not right.He springs to the saddle, the bridle he turns;danger approaches with heighted concerns.a second lamp in the belfry burns!By sea it will bethat good people defeat the tyrant’s might.a shape in the moonlight,a bulk in the dark,a sheet on the mark.Struck out by a steed flying fearless and fleetFeet flying forward like a harley in heat.And yet, through the gloom and the light,the fate of a nation was riding that night;in two years or late, all will be put right.Kindled the land into flame with its heatfor justice and doing all that is right.And beneath him, tranquil and broad and deep,are the values and promises we keep.And under the alders, that skirt its edge,trouble may be coming but still hope residesIs heard the tramp of his steed as he rides.Our trusty band stay true to their pledge.When he crossed the bridge into Medford town,he heard the crowing of the cock,running round and roun’ the anserine flockWho sniffed a rat come into townAnd felt the damp of the river fog,That rises after the sun goes down.when he galloped into Lexingtonwhile everything had gone amok,way down in Washingtonin the moonlight as he passedNo time for talk, too late now,the tyranny would not last.gaze at him with a spectral glareWhen he came to the bridge in Concord town.With a figure of love he took the walkand the twitter of birds among the treesthe sheep felt a shockand the twitter said “Oh Please!”blowing over the meadows brown.Till the running faeries squeezecolors over cap and gown.Who at the bridge would be first to fall,Not from the sleet pelting on his headNor from fog depressing us allpierced by a British musket-ball.Facing a taxing dread,against a tyrant we must still stand tall.how the British Regulars fired and fled,They failed the test as shall we all,if we don’t heed the siren call.From behind each fence and farm-yard wall,Poor souls trapped in the tyrant’s thrall.Then crossing the fields to emerge againConfused by the tumult of where and when.They’ve trampled good faith,ignored all the code.And only pausing to fire and load.hoping to save values we hold dear.To every Middlesex village and farm,by Essex schools in hurried flight.Shouting a message so powerful, so clear.And a word that shall echo forevermore!Two hundred fifty years to the dayThat echo rings, it won’t go away.Through all our history, to the lastThe present is tiny, our future is vast.The people will waken and listen to hearNo matter their sex, gender, color race or creedA message so powerful, so urgent and clear.The crowds of townsfolk who shout and cheerThose who run today and speedthe midnight message of Paul Revere.
Friday, March 21, 2025
AI bots are destroying Open Access
There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet. And the technologists defending against this broad-based attack are doing everything they can to preserve their outlets while trying to remain true to the mission of providing the digital lifeblood of science and culture to the world.
Yes, many of these beloved institutions are under financial pressures in the current political environment, but politics swings back and forth. The AI armies are only growing more aggressive, more rapacious, more deceitful and ever more numerous.
I’m talking about the voracious hunger of AI companies for good data to train Large Language Models (LLMs). These are the trillion-parameter sets of statistical weights that power things like Claude, ChatGPT and hundreds of systems you’ve never heard of. Good training data has lots of text, lots of metadata, is reliable and unbiased. It’s unsullied by Search Engine Optimization (SEO) practitioners. It doesn’t constantly interrupt the narrative flow to try to get you to buy stuff. It’s multilingual, subject specific, and written by experts. In other words, it’s like a library.
At last week’s Code4lib conference hosted by Princeton University Library, technologists from across the library world gathered to share information about library systems, how to make them better, how to manage them, and how to keep them running. The hot topic, the thing everyone wanted to talk about, was how to deal with bots from the dark side.
Bots on the internet are nothing new, but a sea change has occurred over the past year. For the past 25 years, anyone running a web server knew that the bulk of traffic was one sort of bot or another. There was googlebot, which was quite polite, and everyone learned to feed it – otherwise no one would ever find the delicious treats we were trying to give away. There were lots of search engine crawlers working to develop this or that service. You’d get “script kiddies” trying thousands of prepackaged exploits. A server secured and patched by a reasonably competent technologist would have no difficulty ignoring these.
The old style bots were rarely a problem. They respected robot exclusions and “nofollow” warnings. The warning helped bots avoid volatile resources and infinite parameter spaces. Even when they ignored exclusions they seemed to be careful about it. They declared their identity in “user-agent” headers. They limited the request rate and number of simultaneous requests to any particular server. Occasionally there would be a malicious bot like a card-tester or a registration spammer. You’d often have to block these based on IP address. It was part of the landscape, not the dominant feature.
The current generation of bots is mindless. They use as many connections as you have room for. If you add capacity, they just ramp up their requests. They use randomly generated user-agent strings. They come from large blocks of IP addresses. They get trapped in endless hallways. I observed one bot asking for 200,000 nofollow redirect links pointing at Onedrive, Google Drive and Dropbox. (which of course didn’t work, but Onedrive decided to stop serving our Canadian human users). They use up server resources – one speaker at Code4lib described a bug where software they were running was using 32 bit integers for session identifiers, and it ran out!
The good guys are trying their best. They’re sharing block lists and bot signatures. Many libraries are routinely blocking entire countries (nobody in china could possibly want books!) just to be able to serve a trickle of local requests. They are using commercial services such as Cloudflare to outsource their bot-blocking and captchas, without knowing for sure what these services are blocking, how they’re doing it, or whether user privacy and accessibility is being flushed down the toilet. But nothing seems to offer anything but temporary relief. Not that there’s anything bad about temporary relief, but we know the bots just intensify their attack on other content stores.
![]() |
| The view of MIT Press’s Open-Access site from the Wayback Machine. |
The surge of AI bots has hit Open Access sites particularly hard, as their mission conflicts with the need to block bots. Consider that Internet Archive can no longer save snapshots of one of the best open-access publishers, MIT Press because of cloudflare blocking. (see above) Who know how many books will be lost this way? Or consider that the bots took down OAPEN, the worlds most important repository of Scholarly OA books, for a day or two. That’s 34,000 books that AI “checked out” for two days. Or recent outages at Project Gutenberg, which serves 2 million dynamic pages and a half million downloads per day. That’s hundreds of thousands of downloads blocked! The link checker at doab-check.ebookfoundation.org (a project I worked on for OAPEN) is now showing 1,534 books that are unreachable due to “too many requests”. That’s 1,534 books that AI has stolen from us! And it’s getting worse.
Thousands of developer hours are being spent on defense against the dark bots and those hours are lost to us forever. We’ll never see the wonderful projects and features they would have come up with in that time.
The thing that gets me REALLY mad is how unnecessary this carnage is. Project Gutenberg makes all its content available with one click on a file in its feeds directory. OAPEN makes all its books available via an API. There’s no need to make a million requests to get this stuff!! Who (or what) is programming these idiot scraping bots? Have they never heard of a sitemap??? Are they summer interns using ChatGPT to write all their code? Who gave them infinite memory, CPUs and bandwidth to run these monstrosities? (Don’t answer.)
We are headed for a world in which all good information is locked up behind secure registration barriers and paywalls, and it won’t be to make money, it will be for survival. Captchas will only be solvable by advanced AIs and only the wealthy will be able to use internet libraries.
Or maybe we can find ways to destroy the bad bots from within. I’m thinking a billion rickrolls?
Notes:
- I’ve found that I can no longer offer more than 2 facets of faceted search. Another problematic feature is “did you mean” links. AI bots try to follow every link you offer even if there are a billion different ones.
- Two projects, iocaine and nepenthes are enabling the construction of “tarpits” for bots. These are automated infinite mazes that bots get stuck in, perhaps keeping the bots occupied and not bothering anyone else. I’m skeptical.
- Here is an implementation of the Cloudflare Turnstyle service (supposedly free) that was mentioned favorably at the conference.
- It’s not just open access, it’s also Open Source.
- Cloudflare has announced an “AI honeypot”. Should be interesting.
- One way for Open Access site to encourage good bot behavior is to provide carrots to good robots. For this reason, it would be good to add Common Crawl to greenlists: https://commoncrawl.org/ccbot
- Ian Mulvaney (BMJ) concurs.
Tuesday, February 11, 2025
Strava Verse
’Twas brillig, and the slithy toves did gyre and gimble in the wabeI love running with my slithy toves!All mimsy were the borogoves, and the mome raths outgrabe.My right knee was a grobble mimsy today, but mome what a rath!
Beware the Jabberwock, my son!Also, the Jabberrun can be hard on the knees.The jaws that bite, the claws that catch!ERC hosted run had quiche to bite and George to catch.He took his vorpal sword in handNew York Sirens game. Women with vorpal sticks. Slain by the Charge 3-2.Beware the Jubjub bird, and shun the frumious Bandersnatch!Definitely well salted and frumious out there today.Long time the manxome foe he soughtBut quick the manxless chill he caughtSo rested he by the Tumtum treeCovered with snow in filagreeAnd stood a while in thought.Though clabbercing in a profunctional dot!And, as in uffish thought he stoodTrolloping thru the Brookdale wood.The Jabberwock, with eyes of flameCheld and hord, a glistering name…Came whiffling through the tulgey woodAnd caught the two burblygums because he could.And burbled as it came!So late the Jabberrun sleptFor Eight Muyibles passed as though aflameO’er Curbles and Nonces the pluffy sheep leapt.One, two! One, two! And through and throughThree four! Three four! Sankofa’s coffee’s fit to pour.The vorpal blade went snicker-snack!The Icebeest of Hoth kept blobbering back.He went galumphing back.He left it dead, and with its head… the Garmind sprang to lifeAnd hast thou slain the Jabberwock?The ice, the snow, it’s hard as rock.Come to my arms, my beamish boy!Think of my knees! Oy oy oy oy.O frabjous day! Callooh! Callay!”O jousbarf night! The fluss! The fright!He chortled in his joy.(And padoodled the rest of of the way!)‘Twas brillig and the slithy tovesDid not, had not, could not loave.Did gyre and gimble in the wabe“Dunno.” said the wormly autoclaveAll mimsy were the borogoves,Again and again, beloo and aboaveAnd the mome raths outgrabe.The end. Ooh ooh Babe!
- I previously invented “clickstream poetry“. It never caught on.
Tuesday, November 12, 2024
Thank you, New York City
![]() |
| fresh off the bus |
A few meters to my right I saw a woman wearing a large pink button proclaiming her status as a “Birthday Girl”. Her shirt had the name “HEATHER” across the front. I shouted “HAPPY BIRTHDAY HEATHER!”, and she turned to look at me, a bit startled. I walked over and we chatted a bit. She was from the UK, and was running New York to celebrate turning 50. I told her she was going to have fun, and that the crowd would be calling to her the whole way. “Really?” she said. “Hey, this is New York”, I reassured her. “You don’t have to know someone 10 years before you can talk to them on a first name basis!”
Then, over to the side of the corral, I saw another woman, wearing a BIRTHDAY GIRL shirt. “Heather, you must go over and wish her happy birthday!” Heather hesitated, but I said “Aw come on!” and led her through the crowd to the other birthday girl. The two marathon twins hugged, and everything felt right with the world. I looked around and the crowd seemed a bit anxious waiting. I shouted “Hey everyone! We have two birthday girls running with us! Let’s sing Happy Birthday!”
And so I led a happy chorus of more than a thousand runners in a joyful rendition of “Happy Birthday”. Miraculous. My whole day was like that. From start to end, the crowd was shouting my name. They got riled up when I acknowledged them, sometimes chanting “ERIC, ERIC, ERIC” as I gave them high fives.
I had decided to run the 2024 New York City Marathon about ten months earlier. A friend heard me talk about running and suggested that I get a fundraising entry through the charity he was involved with. At that point I had just run my 11th Half Marathon but never a marathon. A marathon seemed an unnecessary stretch for me and my creaky legs. But I decided in an instant. Two days later I told a running friend, Janell, and a few others about my decision. I knew I couldn’t back out after that.
![]() |
| still looking good at mile 9 |
Re-entering the park for the last half mile, I was determined to finish it running. BIG MISTAKE! I cramped up immediately and could barely stagger on. But after a few minutes, my legs consented to a sloooow walk and finally relented on a brisk finish. Then a second miracle occurred. I knew I had friends who were volunteering at the finish line, but to see and hug them all was a blessing I had not expected. And to get the medal from my friend Janell!
![]() |
| Back of the medal with braille text “TCS New York City Marathon” |
Thank you to everyone who donated to my fundraiser for Amref Health Africa. Thank you to Karen and Axel for getting me home with my cramping legs. Thank you to the coaches, runners and PTs who helped my get through the training. Thank you to all the spectators and to the volunteers who got me from the start to the finish, and thank you to the zombies that trudged with me for the long long long walk out of the park.
Strava: All my friends are in New York
- Project Gutenberg
- Inventor of the ebook as we know it.
- Free Ebook Foundation
- Making the world of ebooks safe for the free.
- Unglue.it
- 150,000 Free ebooks.
- Bluesky
- Eric’s Bluesky.
Blog Archive
Popular Posts
-
Personal Note, January 1 2026: I have a new job: Executive Director of the Project Gutenberg Literary Archive Foundation . Here's what I…
-
Back when the web was new, it was fun to watch a website monitor and see the hits come in. The IP address told you the location of the user…
-
"Good thing downloads NOT trackable!" was one twitter response to my post imagining a skirmish in the imminent scholarly publi…
-
Depending on the map provider you're using, there may be a street running through my kitchen. After driving through my kitchen, perhap…
-
In mathematics, catastrophe theory is the study of nonlinear dynamical systems which exhibit points or curves of singularity. The behavior …
Go To Hellman Fan Page
Labels
- ebooks (94)
- Libraries (72)
- book industry (52)
- E-book (49)
- privacy (49)
- Copyright (48)
- business models (45)
- linked data (33)
- Semantic web (28)
- Open Access (26)
- Ungluing Ebooks (26)
- Creative Commons (23)
- physics (23)
- Google Book Search (21)
- Publishing (21)
- Twitter (21)
- Web Design and Development (21)
- library automation (21)
- Google (20)
- Unglue.it (20)
- Piracy (19)
- Gluejar (18)
- magic (18)
- social practice (18)
- ALA Midwinter (17)
- RDF (17)
- Overdrive (16)
- linking technology (16)
- metadata (16)
- scholarly publishing (16)
- Amazon (15)
- Amazon Kindle (15)
- identifiers (15)
- Book Use (14)
- Digital rights management (14)
- ALA Annual (13)
- Conferences (13)
- Google Book Search Settlement (13)
- HarperCollins (13)
- Crossref (12)
- EPUB (12)
- OpenURL (12)
- facebook (12)
- Just Kidding (11)
- New York Times (11)
- RDFa (11)
- Truth (11)
- Big Library Read (10)
- Book Digitization (10)
- HTTP Secure (10)
- The Four Corners of the Sky: A Novel (10)
- isbn (10)
- Blogging (9)
- Public library (9)
- Bugs (8)
- Denny Chin (8)
- IDPF (8)
- Project Gutenberg (8)
- URL redirection (8)
- knowledgebases (8)
- languages (8)
- semtech2009 (8)
- social networks (8)
- wikipedia (8)
- Attributor (7)
- Book Rights Registry (7)
- Hackathon (7)
- Kickstarter (7)
- Library (7)
- New Jersey (7)
- RA21 (7)
- bit.ly (7)
- Apple (6)
- DOI (6)
- Digital library (6)
- Google Books (6)
- IPad (6)
- India (6)
- Newspaper industry (6)
- Open Source (6)
- Public Domain (6)
- running (6)
- semantic technology (6)
- Digital Object Identifier (5)
- Entrepreneurship (5)
- Intel (5)
- Interlibrary loan (5)
- Library journal (5)
- Microdata (5)
- OCLC (5)
- Star Trek (5)
- authentication (5)
- crowdfunding (5)
- public identity (5)
- Aaron Swartz (4)
- Amazon Web Services (4)
- American Library Association (4)
- Bell Labs (4)
- Bitcoin (4)
- Brian O'Leary (4)
- Code4Lib (4)
- DPLA (4)
- Electronic Journals (4)
- Google Analytics (4)
- J. K. Rowling (4)
- Koha (4)
- Liblime (4)
- LibraryThing (4)
- Neal Stephenson (4)
- Publishing Point (4)
- SOPA (4)
- Sweden (4)
- my attic (4)
- my dad (4)
- Accessibility (3)
- AdWords (3)
- Adobe Digital Editions (3)
- Baseball (3)
- Bruce Springsteen (3)
- Cryptography (3)
- Forms of government (3)
- Geolocation (3)
- GitHub (3)
- Google Wave (3)
- JSTOR (3)
- Macmillan (3)
- Network Effect (3)
- New York Public Library (3)
- OWL (3)
- PTFS (3)
- Search Engine Optimization (3)
- blockchain (3)
- death (3)
- genealogy (3)
- hashtags (3)
- http-range (3)
- iPhone (3)
- poetry (3)
- politics (3)
- security (3)
- unicode (3)
- Advertising (2)
- Americans with Disabilities Act of 1990 (2)
- Book Design (2)
- Book Industry Study Group (2)
- Bots (2)
- Database Licensing (2)
- Disruptive technology (2)
- Electronic Frontier Foundation (2)
- FRBR (2)
- Fair use (2)
- Fan Fiction (2)
- File sharing (2)
- Fusion Tables (2)
- Gitenberg (2)
- Google Book (2)
- Great Gatsby (2)
- Hachette Book Group (2)
- Hal Varian (2)
- Hurricane Sandy (2)
- Internet Archive (2)
- John Sundman (2)
- Nook (2)
- OpenID (2)
- OpenSource (2)
- Payments (2)
- Philadelphia Phillies (2)
- Proxy server (2)
- Radiolab (2)
- Random House (2)
- Rush Holt (2)
- School library (2)
- Social network (2)
- Spam (2)
- Star trek TNG (2)
- Vegetables (2)
- Wolfram Alpha (2)
- ebrary (2)
- linkedin (2)
- technology (2)
- tr.im (2)
- AdaptiveBlue (1)
- Assistive Technology (1)
- Beer (1)
- Bibliocommons (1)
- Brewster Kahle (1)
- Clay Johnson (1)
- Clayton M. Christensen (1)
- Comic Con (1)
- DBpedia (1)
- DCWG (1)
- Dave Winer (1)
- Digital watermarking (1)
- EBL (1)
- Evan Ratliff (1)
- Evert Taube (1)
- Firefox (1)
- GNU Affero General Public License (1)
- Garage sale (1)
- Hugh Howie (1)
- Ian Davis (1)
- Infochimps (1)
- Infrastructure (1)
- Instant Messaging (1)
- Jon Stewart (1)
- Knowledge representation (1)
- Kobo (1)
- Lawrence Lessig (1)
- Mac OS X (1)
- Metcalfe's Law (1)
- Neil Gaiman (1)
- Neurobiology (1)
- ORCID (1)
- Open Database License (1)
- Open Knowledge Foundation (1)
- Open Library (1)
- PDDL (1)
- Paypal (1)
- ProQuest (1)
- PubMed (1)
- Qin Dynasty (1)
- Qin Shi Huangdi (1)
- RV Guha (1)
- Ralph Waldo Emerson (1)
- SPARQL (1)
- Simon and Schuster (1)
- Single sign-on (1)
- Siri (1)
- Star Wars (1)
- Text-To-Speech (1)
- Textbooks (1)
- The Hitchhiker's Guide to the Galaxy (1)
- Tim O'Reilly (1)
- Tor (anonymity network) (1)
- Warner Oland (1)
- Weeds (1)
- YouTube (1)
- Zemanta (1)
- Zola Books (1)
- dead serious (1)
- design patterns (1)
- family (1)
- gmail (1)
- h1n1 (1)
- life (1)
- music (1)
- patents (1)
- shibboleth (1)
- swedish music (1)
- twitterdata (1)






