Origin Private File System for the WASM version #113

pedrovgs · 2024-05-30T15:06:03Z

Hi @paulocoutinhox. First of all, thank you so much for this awesome project!

I've been checking the WASM version including the demo provided and I'm wondering if you tried to load the files from OPFS instead of loading the file from a byte array. I'm trying to do it but I can't get it working and I don't even know if it is supported, so I'm asking just in case. Thanks!

paulocoutinhox · 2024-05-30T18:36:31Z

Hi @pedrovgs,

You can use a function like this to read from OPFS:

async function fileToByteArray(fileName) {
    try {
        // Get directory handle
        const dirHandle = await navigator.storage.getDirectory();
        
        // Get file handle
        const fileHandle = await dirHandle.getFileHandle(fileName);
        
        // Get file from file handle
        const file = await fileHandle.getFile();
        
        // Read file as array buffer
        const arrayBuffer = await file.arrayBuffer();
        
        // Convert array buffer to byte array
        const fileByteArray = Array.from(new Uint8Array(arrayBuffer));
        
        return fileByteArray;
    } catch (e) {
        throw new Error(e);
    }
}

// Usage example
(async () => {
    try {
        const byteArray = await fileToByteArray('example-file.txt');
        console.log(byteArray);
    } catch (e) {
        console.error(e);
    }
})();

Thanks.

pedrovgs · 2024-05-30T18:55:40Z

Thanks for the quick response @paulocoutinhox !! That's the solution we are using already. But we would like to avoid this because reading the array buffer from the file means we have to allocate in memory all the file content. For big pdfs that could be a lot of megabytes. We would like to use the API reading the file from the file system using the path so we can avoid some memory allocation when possible. Do you know if we can do this? Afaik emscripten should provide access to the file system but I don't know if they support opfs.

LoadDocument function is available and you can invoke it, but the document load always fails when I tested it.

paulocoutinhox · 2024-05-30T19:32:21Z

Hi,

You can use it, check:
emscripten-core/emscripten#15950
and
https://github.com/emscripten-core/emscripten/blob/main/test/wasmfs/wasmfs_opfs.c

Example:

// Initialize Emscripten with WasmFS and OPFS support
const Module = {
  onRuntimeInitialized: async function() {
    // Mount OPFS
    FS.mkdir('/opfs');
    FS.mount(OPFS, {}, '/opfs');
  }
};

async function loadEmscriptenModule() {
  return new Promise((resolve, reject) => {
    const script = document.createElement('script');
    script.src = 'path/to/your/emscripten/module.js';
    script.onload = () => {
      script.onload = null;
      resolve();
    };
    script.onerror = reject;
    document.body.appendChild(script);
  });
}

async function fileToByteArray(fileName) {
  try {
    // Ensure Emscripten is loaded
    await loadEmscriptenModule();

    // Wait for Emscripten initialization
    await new Promise((resolve) => {
      Module.onRuntimeInitialized = resolve;
    });

    // Read file from OPFS
    const filePath = `/opfs/${fileName}`;
    const fileByteArray = FS.readFile(filePath, { encoding: 'binary' });

    return Array.from(fileByteArray);
  } catch (e) {
    throw new Error(e);
  }
}

// Usage example
(async () => {
  try {
    const byteArray = await fileToByteArray('example-file.pdf');
    console.log(byteArray);
  } catch (e) {
    console.error(e);
  }
})();

paulocoutinhox · 2024-08-15T18:00:20Z

Hi,

Project was updated to latest version.

If your problem was solved, can you close the issue pls?

Thanks.

pedrovgs · 2024-08-21T06:23:38Z

Hey @paulocoutinhox Paulo I'm going to close the issue but I don't think this solves the problem we had. We are looking for one API that doesn't require us to read all the file at once from OPFS. Because we manage huge PDFs and this may consume a lot of memory.

paulocoutinhox · 2024-08-21T17:22:18Z

Why not use this:

#include "fpdfview.h"
#include "fpdf_doc.h"
#include "fpdf_text.h"

#include <iostream>
#include <fstream>

// custom read callback
size_t readBlock(void* param, unsigned long pos, unsigned char* pBuf, unsigned long size) {
    std::ifstream* file = reinterpret_cast<std::ifstream*>(param);
    if (!file->seekg(pos)) return 0;
    file->read(reinterpret_cast<char*>(pBuf), size);
    return file->gcount();
}

int main() {
    // initialize pdfium
    FPDF_InitLibrary();

    std::ifstream file("your_file.pdf", std::ios::binary);
    if (!file.is_open()) {
        std::cerr << "failed to open the pdf file." << std::endl;
        return -1;
    }

    // configure file access
    FPDF_FILEACCESS fileAccess;
    fileAccess.m_FileLen = file.seekg(0, std::ios::end).tellg();
    file.seekg(0, std::ios::beg);
    fileAccess.m_GetBlock = &readBlock;
    fileAccess.m_Param = &file;

    // load the pdf document using custom read access
    FPDF_DOCUMENT document = FPDF_LoadCustomDocument(&fileAccess, nullptr);
    if (!document) {
        std::cerr << "failed to load the pdf document." << std::endl;
        FPDF_DestroyLibrary();
        return -1;
    }

    // get the total number of pages
    int page_count = FPDF_GetPageCount(document);
    std::cout << "total number of pages: " << page_count << std::endl;

    // load each specific page (e.g., page 1)
    for (int i = 0; i < page_count; ++i) {
        FPDF_PAGE page = FPDF_LoadPage(document, i);
        if (!page) {
            std::cerr << "failed to load page " << i + 1 << "." << std::endl;
            continue;
        }

        std::cout << "page " << i + 1 << " loaded successfully." << std::endl;

        // here you can perform actions with the loaded page
        // such as extracting text, rendering, etc.

        // don't forget to close the page after use
        FPDF_ClosePage(page);
    }

    // close the document after finishing
    FPDF_CloseDocument(document);
    
    // destroy pdfium
    FPDF_DestroyLibrary();

    return 0;
}

paulocoutinhox · 2024-08-22T04:32:59Z

Hi @CetinSert

Can you help me change to "LoadCustomDocument"?

I made a small change to test it:
#129

But im getting error.

Thanks.

CetinSert · 2024-08-22T04:34:50Z

Hi! I will take a look soon and edit this comment with what I come up with!

pedrovgs closed this as completed Aug 21, 2024

paulocoutinhox reopened this Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Origin Private File System for the WASM version #113

Origin Private File System for the WASM version #113

pedrovgs commented May 30, 2024

paulocoutinhox commented May 30, 2024

pedrovgs commented May 30, 2024

paulocoutinhox commented May 30, 2024

paulocoutinhox commented Aug 15, 2024

pedrovgs commented Aug 21, 2024

paulocoutinhox commented Aug 21, 2024 •

edited

Loading

paulocoutinhox commented Aug 22, 2024

CetinSert commented Aug 22, 2024

Origin Private File System for the WASM version #113

Origin Private File System for the WASM version #113

Comments

pedrovgs commented May 30, 2024

paulocoutinhox commented May 30, 2024

pedrovgs commented May 30, 2024

paulocoutinhox commented May 30, 2024

paulocoutinhox commented Aug 15, 2024

pedrovgs commented Aug 21, 2024

paulocoutinhox commented Aug 21, 2024 • edited Loading

paulocoutinhox commented Aug 22, 2024

CetinSert commented Aug 22, 2024

paulocoutinhox commented Aug 21, 2024 •

edited

Loading