Standalone export HTML

Post by **Jerther** » Tue May 24, 2022 4:46 pm

Hi!!

I have embedded the Gantt diagram inside an existing framework. To have access to most web resources like images, fonts and css, that framework requires users to be logged in. This all works fine when viewing the diagram.

When exporting however, the export server which uses puppeteer (opens a headless chrome to render the html) has all sorts of problems because of that, mixed with CORS problems. I won't go into the details but I've been through a lot of trial and error

Now I've settled on a solution that will eliminate all security problems: make the html data standalone. That is, replace all external links to css, images, etc. by actual binary data in the html data sent to the export server. This should also improve the export server performance.

I've been able to embed all CSS and img tags by using https://www.bryntum.com/docs/gantt/api/Grid/feature/export/exporter/Exporter#function-pageTpl but now I'm struggling with url() in CSS. The problem is that pageTpl() is not async, and I cannot use anything that uses a Promise in there, like fetch() to get the data and update the style rule.

So the only way I can think of would be to do the CSS url() substitution when the page has finished loading on the user browser, and let the export feature work as usual. But that feels more like a hack and there's a risk of visual glitches

Any suggestion?

I'll gladly share what I've come up with when it's done

Post by **Maxim Gorkovsky** » Tue May 24, 2022 5:59 pm

Hello.
Have you tried using clientUrl config? Please see this guide which may explain your use case: viewtopic.php?f=52&t=21367&p=105778#p105778

Speaking of async flow, while pageTpl is synchronous, there are async methods used by the export feature which you can extend and preload the resources. For instance, you can override the prepareComponent method to preload all the resources:

class MySinglePageExporter extends SinglePageExporter {
  static get type() {
    return 'mysinglepage';
  }

  static get title() {
    // In case locale is missing exporter is still distinguishable
    return 'My single page';
  }

  async prepareComponent(config) {
    await super.prepareComponent(config);
    // grab all style/link nodes
    let stylesheets = this.stylesheets;
    stylesheets = await myPreloadResourcesAndBase64EncodeThem(stylesheets);
    // put them back on exporter, these links/styles will be inserted to the pageTpl
    this.stylesheets = stylesheets;
  }
}

new Grid({
  features: {
    pdfExport: {
      exporters: [MySinglePageExporter]
    }
  }
})

Post by **Jerther** » Wed May 25, 2022 8:30 pm

Neat! Thank you Maxim!

clientUrl works fine, except when some stylesheets (in our framework at least) make use of the url() CSS function and those links are not translated.

I'm about to have something working.

(by the way the link you gave leads back here )

Post by **Maxim Gorkovsky** » Thu May 26, 2022 9:40 am

Oh, shame, here's the right one: https://github.com/bryntum/pdf-export-server/blob/main/docs/architecture.md#remote-web-server-remote-export-server

But since you have it almost working with clientUrl, you probably don't need it anymore

Post by **Jerther** » Fri May 27, 2022 5:34 pm

Ok so as a man of his words, here's what I've come up with. It gathers all external CSS and image tags to make standalone HTML data that will require no additional request on the server side. It works well, but it has two caveats:

Most resources are gathered from cache, but CSS url() resources had to be redownloaded with fetch(). Since CDN resources in there are most likely to fail because of CORS, I decided to just clear them. Not great, but that's the best I could do and it had no impact in our specific situation. This is the meat of the code.
Because the same resource can be parsed multiple times everywhere in the HTML, this can lead to a lot of data for every exported pages. In out case it was a little more than 20 MB for every page of a 10 pages export, so that's 200 MB that's sent from the client to the export server via POST. It also leads to very slow performance of the export server. Not exactly fast and relies partly on the client's internet. But this could be alleviated for very lean Gantts.

Maybe there's some way of fixing #1 by accessing some cache, and also optimizing #2 by referencing the same recource instead of copying it everywhere, but at some point I decided to give up on this strategy.

A few notes about the code:

It uses stylesheet.cssRules. It works with Chrome but may not work with Firefox. It could be or'ed with the deprecated stylesheet.rules
JQuery is used to parse HTML data and could easily be replaced by native JS.
Not the prettiest code I admit. I would've loved to refactor it into multiple shorter methods but Override does not add new methods and I stopped working on this before I learned Override is not meant for permanent overrides and how to do it with prototypes.
Not tested against memory leaks

Although I won't be using this code, a lot of time went into it, so I hope it shomehow helps and/or inspires someone.


class ExporterOverride {
    static get target() {
        return { class : Exporter }
    }

    pageTpl(data) {
        function embedImgsInUrl(parent) {
            const images = parent.find('img');
            for (let image of images) {
                const canvas = document.createElement('canvas');
                canvas.width = image.width;
                canvas.height = image.height;
                const ctx = canvas.getContext('2d');
                ctx.drawImage(image, 0, 0);
                image.src = canvas.toDataURL("image/png");
            }
            return parent[0].outerHTML
        }
        
        data.header = embedImgsInUrl($(data.header));
        data.html = embedImgsInUrl($(data.html));
        data.footer = embedImgsInUrl($(data.footer));

        return this._overridden.pageTpl.call(this, data);
    }

    async prepareComponent(config) {
        await this._overridden.prepareComponent.call(this, config);

        const styleSheetNodes = Array.from(document.querySelectorAll('link[rel="stylesheet"], style'));
        const styleSheets = this.stylesheets
            .filter(s => {  // Filter out unnecessary stylesheets
                const href = $(s).prop('href');
                const whitelist = [
                    '/planif/static/',
                    '/web/content/',
                    '/web/static/',
                    '/web_enterprise/static/',
                ]
                return s && (!href || whitelist.some(path => href.includes(path)));
            })
            .map(style => {
                let href = $(style).prop('href');
                const sheet = new CSSStyleSheet();
                if (href) {
                    const actualSheet = styleSheetNodes.find(n => n.href == $(style).prop('href')).sheet;
                    sheet.replaceSync(Array.from(actualSheet.cssRules).map(r => r.cssText).join(' '));
                } else {
                    sheet.replaceSync(style);
                }
                return {sheet, href};
            })
            .filter(sheetDef => sheetDef.sheet.cssRules.length);

        const rulesToProcess = styleSheets.reduce((acc, sheetDef) => {
            acc.push(Array.from(sheetDef.sheet.cssRules)
                .filter(r => r.style)
                .map(rule => ({
                    rule,
                    styles: Array.from(rule.style)
                        .filter(k => rule.style[k] && rule.style[k].startsWith('url'))
                        .map(styleKey => rule.style[styleKey].split(',')
                            .map(s => s.match(/url\(["']?([^"']*)["']?\)/))
                            .filter(m => m)
                            .map(m => m[1])
                            .filter(url => !url.startsWith('data:') && !url.startsWith('#'))
                            .map(url => ({
                                styleKey,
                                originalUrl: url,
                                url: new URL(url.startsWith('/') || url.startsWith('http') ? url : `${sheetDef.href.substring(0, sheetDef.href.lastIndexOf("/"))}/${url}`, document.location.origin).toString()
                             }))
                        )
                        .flat()
                }))
                .filter(pair => pair.styles.length)
            );
            return acc;
        }, [])
        .flat();

        const rulesToFetch = rulesToProcess
            .map(pair => pair.styles)
            .flat()
            .reduce((acc, curr) => {
                if (!acc.some(p => p.originalUrl == curr.originalUrl))
                    acc.push(curr);
                return acc;
            }, []);

        const b64data = await Promise.all(
            Array.from(rulesToFetch).map(async pair => {
                // External URLs cannot be fetched because of CORS, so set them as empty
                if (!pair.originalUrl.startsWith('http') || pair.originalUrl.startsWith(document.location.origin)) {
                    const response = await fetch(pair.url);
                    const blob = await response.blob();
                    return new Promise(resolve => {
                        const reader = new FileReader();
                        reader.onload = function() {
                            resolve({originalUrl: pair.originalUrl, data: this.result});
                        };
                        reader.readAsDataURL(blob);
                    })
                } else {
                    return { originalUrl: pair.originalUrl, data: '' };
                }
            })
        );

        for (const dataPair of b64data) {
            for (const rulePair of rulesToProcess) {
                for (const s of rulePair.styles.filter(s => s.originalUrl == dataPair.originalUrl)) {
                    rulePair.rule.style.setProperty(s.styleKey, rulePair.rule.style[s.styleKey].replace(s.originalUrl, dataPair.data));
                }
            }
        }

        this.stylesheets = styleSheets
            .map(sheetDef => Array.from(sheetDef.sheet.cssRules).map(r => r.cssText).join(' '))
            .map(s => $('<style>').text(s).prop('outerHTML'));
    }
}
Override.apply(ExporterOverride);

A possible new downloadResult() to go along downloadTestCase(), maybe?

Post by **Maxim Gorkovsky** » Mon May 30, 2022 2:39 pm

Thank you for sharing this solution.

A possible new downloadResult() to go along downloadTestCase(), maybe?

What do you need that for? Inspect HTML that goes to the server? It is possible to do in the network tab, you just need to remove first/last quotes and replace \" with " - then you get plain HTML.
Also there is a method you can override and check all the data before request is sent: https://bryntum.com/docs/gantt/api/Grid/feature/export/PdfExport#function-receiveExportContent

Post by **Jerther** » Mon May 30, 2022 2:45 pm

Oh I don't need it myself. It was just an idea on the top of my head

Thanks for the tip, I was looking for something like receiveExportContent.

Support Forum

Standalone export HTML