티스토리 수익 글 보기

티스토리 수익 글 보기

Simplify retrieval of book full-text, get rid of rsync, prefer epub3 to epub by benoit74 · Pull Request #242 · openzim/gutenberg · GitHub
Skip to content

Conversation

@benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Mar 28, 2025

Fix #97
Fix #219
Fix #235
Fix #160

Changes:

@benoit74 benoit74 self-assigned this Mar 28, 2025
@benoit74 benoit74 force-pushed the simplify_full_text_retrieval branch from 3b7e831 to 1978122 Compare March 28, 2025 10:10
@benoit74 benoit74 changed the title Simplify retrieval of book full-text Simplify retrieval of book full-text, get rid of rsync, prefer epub3 to epub Mar 28, 2025
@benoit74 benoit74 force-pushed the simplify_full_text_retrieval branch from 1978122 to 066382a Compare March 28, 2025 16:15
@benoit74 benoit74 marked this pull request as ready for review March 28, 2025 16:17
@benoit74 benoit74 requested a review from rgaudin March 28, 2025 16:17
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ; not tested.

I saw a mention of multiple passes ; do we still do that? How so?

@benoit74
Copy link
Collaborator Author

benoit74 commented Apr 1, 2025

not tested.

Of course, will be done before publishing ZIMs again anyway

I saw a mention of multiple passes ; do we still do that? How so?

Where ? We don’t AFAIK ^^

unoptimized_fpath = unoptimized_dir / fname_for(book, book_format)
logger.debug(f"Processing {book_format}")

# if we already know (e.g. due to a former pass) that this book format is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

Copy link
Collaborator Author

@benoit74 benoit74 Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it.

We don’t do it on the Zimfarm. But scraper is mostly capable to support it, typically when you run the tool twice with different settings while keeping temporary files. So I ensured I did not introduced any regression on this “functionality” + optimized a bit code behavior at this download stage.

@benoit74 benoit74 merged commit 6049ced into main Apr 1, 2025
5 checks passed
@benoit74 benoit74 deleted the simplify_full_text_retrieval branch April 1, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants