Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Origin Private File System for the WASM version #113

Open
pedrovgs opened this issue May 30, 2024 · 8 comments
Open

Origin Private File System for the WASM version #113

pedrovgs opened this issue May 30, 2024 · 8 comments

Comments

@pedrovgs
Copy link

Hi @paulocoutinhox. First of all, thank you so much for this awesome project!

I've been checking the WASM version including the demo provided and I'm wondering if you tried to load the files from OPFS instead of loading the file from a byte array. I'm trying to do it but I can't get it working and I don't even know if it is supported, so I'm asking just in case. Thanks!

@paulocoutinhox
Copy link
Owner

Hi @pedrovgs,

You can use a function like this to read from OPFS:

async function fileToByteArray(fileName) {
    try {
        // Get directory handle
        const dirHandle = await navigator.storage.getDirectory();
        
        // Get file handle
        const fileHandle = await dirHandle.getFileHandle(fileName);
        
        // Get file from file handle
        const file = await fileHandle.getFile();
        
        // Read file as array buffer
        const arrayBuffer = await file.arrayBuffer();
        
        // Convert array buffer to byte array
        const fileByteArray = Array.from(new Uint8Array(arrayBuffer));
        
        return fileByteArray;
    } catch (e) {
        throw new Error(e);
    }
}

// Usage example
(async () => {
    try {
        const byteArray = await fileToByteArray('example-file.txt');
        console.log(byteArray);
    } catch (e) {
        console.error(e);
    }
})();

Thanks.

@pedrovgs
Copy link
Author

Thanks for the quick response @paulocoutinhox !! That's the solution we are using already. But we would like to avoid this because reading the array buffer from the file means we have to allocate in memory all the file content. For big pdfs that could be a lot of megabytes. We would like to use the API reading the file from the file system using the path so we can avoid some memory allocation when possible. Do you know if we can do this? Afaik emscripten should provide access to the file system but I don't know if they support opfs.

LoadDocument function is available and you can invoke it, but the document load always fails when I tested it.

@paulocoutinhox
Copy link
Owner

Hi,

You can use it, check:
emscripten-core/emscripten#15950
and
https://github.com/emscripten-core/emscripten/blob/main/test/wasmfs/wasmfs_opfs.c

Example:

// Initialize Emscripten with WasmFS and OPFS support
const Module = {
  onRuntimeInitialized: async function() {
    // Mount OPFS
    FS.mkdir('/opfs');
    FS.mount(OPFS, {}, '/opfs');
  }
};

async function loadEmscriptenModule() {
  return new Promise((resolve, reject) => {
    const script = document.createElement('script');
    script.src = 'path/to/your/emscripten/module.js';
    script.onload = () => {
      script.onload = null;
      resolve();
    };
    script.onerror = reject;
    document.body.appendChild(script);
  });
}

async function fileToByteArray(fileName) {
  try {
    // Ensure Emscripten is loaded
    await loadEmscriptenModule();

    // Wait for Emscripten initialization
    await new Promise((resolve) => {
      Module.onRuntimeInitialized = resolve;
    });

    // Read file from OPFS
    const filePath = `/opfs/${fileName}`;
    const fileByteArray = FS.readFile(filePath, { encoding: 'binary' });

    return Array.from(fileByteArray);
  } catch (e) {
    throw new Error(e);
  }
}

// Usage example
(async () => {
  try {
    const byteArray = await fileToByteArray('example-file.pdf');
    console.log(byteArray);
  } catch (e) {
    console.error(e);
  }
})();

@paulocoutinhox
Copy link
Owner

Hi,

Project was updated to latest version.

If your problem was solved, can you close the issue pls?

Thanks.

@pedrovgs
Copy link
Author

Hey @paulocoutinhox Paulo I'm going to close the issue but I don't think this solves the problem we had. We are looking for one API that doesn't require us to read all the file at once from OPFS. Because we manage huge PDFs and this may consume a lot of memory.

@paulocoutinhox
Copy link
Owner

paulocoutinhox commented Aug 21, 2024

Why not use this:

#include "fpdfview.h"
#include "fpdf_doc.h"
#include "fpdf_text.h"

#include <iostream>
#include <fstream>

// custom read callback
size_t readBlock(void* param, unsigned long pos, unsigned char* pBuf, unsigned long size) {
    std::ifstream* file = reinterpret_cast<std::ifstream*>(param);
    if (!file->seekg(pos)) return 0;
    file->read(reinterpret_cast<char*>(pBuf), size);
    return file->gcount();
}

int main() {
    // initialize pdfium
    FPDF_InitLibrary();

    std::ifstream file("your_file.pdf", std::ios::binary);
    if (!file.is_open()) {
        std::cerr << "failed to open the pdf file." << std::endl;
        return -1;
    }

    // configure file access
    FPDF_FILEACCESS fileAccess;
    fileAccess.m_FileLen = file.seekg(0, std::ios::end).tellg();
    file.seekg(0, std::ios::beg);
    fileAccess.m_GetBlock = &readBlock;
    fileAccess.m_Param = &file;

    // load the pdf document using custom read access
    FPDF_DOCUMENT document = FPDF_LoadCustomDocument(&fileAccess, nullptr);
    if (!document) {
        std::cerr << "failed to load the pdf document." << std::endl;
        FPDF_DestroyLibrary();
        return -1;
    }

    // get the total number of pages
    int page_count = FPDF_GetPageCount(document);
    std::cout << "total number of pages: " << page_count << std::endl;

    // load each specific page (e.g., page 1)
    for (int i = 0; i < page_count; ++i) {
        FPDF_PAGE page = FPDF_LoadPage(document, i);
        if (!page) {
            std::cerr << "failed to load page " << i + 1 << "." << std::endl;
            continue;
        }

        std::cout << "page " << i + 1 << " loaded successfully." << std::endl;

        // here you can perform actions with the loaded page
        // such as extracting text, rendering, etc.

        // don't forget to close the page after use
        FPDF_ClosePage(page);
    }

    // close the document after finishing
    FPDF_CloseDocument(document);
    
    // destroy pdfium
    FPDF_DestroyLibrary();

    return 0;
}

@paulocoutinhox
Copy link
Owner

Hi @CetinSert

Can you help me change to "LoadCustomDocument"?

I made a small change to test it:
#129

But im getting error.

Thanks.

@CetinSert
Copy link

Hi! I will take a look soon and edit this comment with what I come up with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants