Page 1 of 2

How to calculate ofc updates 20x faster

PostPosted: 21.01.2016, 16:27
by hbuhrmester
How to calculate ofc updates 20x faster

Processing of dynamic Office updates (ofc) is notoriously slow. This is mostly because of two nested loops, which do numerous line-by-line comparisons of two input files:

DownloadUpdates.cmd, version 10.3.2, lines 1302 - 1309
Code: Select all
for /F "usebackq tokens=1,2 delims=," %%i in ("%TEMP%\OfficeUpdateCabExeIdsAndLocations.txt") do (
  for /F "usebackq tokens=1,2 delims=," %%k in ("%TEMP%\OfficeUpdateAndFileIds.txt") do (
    if /i "%%l"=="%%i" (
      echo %%j>>"%TEMP%\DynamicDownloadLinks-%1-%2.txt"
      echo %%k,%%j>>"%TEMP%\UpdateTableURL-%1-%2.csv"
    )
  )
)


Typical line counts for the files while calculating "ofc glb" are:

Code: Select all
2529 lines - OfficeUpdateCabExeIdsAndLocations.txt
8934 lines - OfficeUpdateAndFileIds.txt
2529 lines - DynamicDownloadLinks-ofc-glb.txt
2595 lines - UpdateTableURL-ofc-glb.csv


If the first file is read only once, this makes:

Code: Select all
      read operations:    1 + 2529 = 2530
     write operations: 2529 + 2595 = 5124
total file operations: 2530 + 5124 = 7654


If a virus scanner intercepts all operations, this will be surely slow. But all this can be replaced by just two calls of "join":

Code: Select all
..\bin\join -t "," -o "1.2" "%TEMP%\OfficeUpdateCabExeIdsAndLocations.txt" "%TEMP%\OfficeUpdateAndFileIds.txt" > "%TEMP%\DynamicDownloadLinks-%1-%2-Unsorted.txt"

..\bin\join -t "," -o "2.2,1.2" "%TEMP%\OfficeUpdateCabExeIdsAndLocations.txt" "%TEMP%\OfficeUpdateAndFileIds.txt" > "%TEMP%\UpdateTableURL-%1-%2.csv"


"join" is a classical Unix command and also part of the GNU Core Utilities package. Ports are available in the GNUWin32 project:

https://en.wikipedia.org/wiki/Join_%28Unix%29
https://en.wikipedia.org/wiki/GNU_Core_Utilities
https://en.wikipedia.org/wiki/GnuWin32

So they can probably be included in WSUS Offline Update, just like wget and cabextract. A working example would be:

Code: Select all
@echo off

setlocal enableextensions enabledelayedexpansion
if errorlevel 1 goto NoExtensions
set CSCRIPT_PATH=%SystemRoot%\System32\cscript.exe

rem This is an excerpt from the function DownloadCore for the calculation
rem of dynamic office updates. It goes from the label :DetermineOffice
rem to :ExcludeOffice. It assumes, that the file "%TEMP%\package.xml" has
rem already been created.
rem
rem The GNU utilities gsort and join from the project GNUWin32 must be placed
rem in ..\bin. "gsort.exe" can be used instead of "sort.exe". It is the same
rem file, only renamed to avoid ambiguities. Two libraries are also needed,
rem so there are four new files in ..\bin:
rem
rem gsort.exe, join.exe, libiconv2.dll, libintl3.dll
rem
rem For comparing and extracting fields with join, both input files must be
rem sorted. Both GNU utilities should honor the environment variable LC_ALL
rem for sorting and comparison.
set LC_ALL=C

set LANG_SHORT=
call :DetermineOffice ofc glb
set LANG_SHORT=en
call :DetermineOffice ofc enu
set LANG_SHORT=de
call :DetermineOffice ofc deu
goto EoF

:DetermineOffice
rem *** Determine dynamic update urls for %1 %2 ***
echo %TIME% - Determining dynamic update urls for %1 %2 (please be patient, this will take a while)...

%CSCRIPT_PATH% //Nologo //E:vbs XSLT.vbs "%TEMP%\package.xml" ..\xslt\ExtractUpdateCategoriesAndFileIds.xsl "%TEMP%\UpdateCategoriesAndFileIds.txt"
if errorlevel 1 goto DownloadError
%CSCRIPT_PATH% //Nologo //E:vbs XSLT.vbs "%TEMP%\package.xml" ..\xslt\ExtractUpdateCabExeIdsAndLocations.xsl "%TEMP%\UpdateCabExeIdsAndLocations-Unsorted.txt"
if errorlevel 1 goto DownloadError

rem sort file using GNU sort
..\bin\gsort --unique "%TEMP%\UpdateCabExeIdsAndLocations-Unsorted.txt" > "%TEMP%\UpdateCabExeIdsAndLocations.txt"
del "%TEMP%\UpdateCabExeIdsAndLocations-Unsorted.txt"

if exist "%TEMP%\OfficeUpdateAndFileIds.txt" del "%TEMP%\OfficeUpdateAndFileIds.txt"
if exist "%TEMP%\OfficeFileIds.txt" del "%TEMP%\OfficeFileIds.txt"
set UPDATE_ID=
set UPDATE_CATEGORY=
set UPDATE_LANGUAGES=
for /F "usebackq tokens=1,2 delims=;" %%i in ("%TEMP%\UpdateCategoriesAndFileIds.txt") do (
  if "%%j"=="" (
    if "!UPDATE_CATEGORY!"=="477b856e-65c4-4473-b621-a8b230bb70d9" (
      for /F "tokens=1-3 delims=," %%k in ("%%i") do (
        if "%%l" NEQ "" (
          if /i "%2"=="glb" (
            if "!UPDATE_LANGUAGES!_%%m"=="_" (
              rem Swap the field order in OfficeUpdateAndFileIds.txt
              echo %%l,!UPDATE_ID!>>"%TEMP%\OfficeUpdateAndFileIds-Unsorted.txt"
              echo %%l>>"%TEMP%\OfficeFileIds-Unsorted.txt"
            )
            if "!UPDATE_LANGUAGES!_%%m"=="en_en" (
              echo %%l,!UPDATE_ID!>>"%TEMP%\OfficeUpdateAndFileIds-Unsorted.txt"
              echo %%l>>"%TEMP%\OfficeFileIds-Unsorted.txt"
            )
          ) else (
            if "%%m"=="%LANG_SHORT%" (
              echo %%l,!UPDATE_ID!>>"%TEMP%\OfficeUpdateAndFileIds-Unsorted.txt"
              echo %%l>>"%TEMP%\OfficeFileIds-Unsorted.txt"
            )
          )
        )
      )
    )
  ) else (
    for /F "tokens=1 delims=," %%k in ("%%i") do (
      set UPDATE_ID=%%k
    )
    for /F "tokens=1* delims=," %%k in ("%%j") do (
      set UPDATE_CATEGORY=%%k
      set UPDATE_LANGUAGES=%%l
    )
  )
)
set UPDATE_ID=
set UPDATE_CATEGORY=
set UPDATE_LANGUAGES=
del "%TEMP%\UpdateCategoriesAndFileIds.txt"

rem Sort both output files using GNU sort
..\bin\gsort --unique "%TEMP%\OfficeUpdateAndFileIds-Unsorted.txt" > "%TEMP%\OfficeUpdateAndFileIds.txt"
..\bin\gsort --unique "%TEMP%\OfficeFileIds-Unsorted.txt" > "%TEMP%\OfficeFileIds.txt"
del "%TEMP%\OfficeUpdateAndFileIds-Unsorted.txt"
del "%TEMP%\OfficeFileIds-Unsorted.txt"

rem Field order
rem File 1: OfficeFileIds.txt
rem  - Field 1: FileId
rem File 2: UpdateCabExeIdsAndLocations.txt
rem  - Field 1: FileId
rem  - Field 2: Location (URL)

rem Write FileIds and Locations. Since both input files are sorted by the
rem first field, the output file will also be sorted.
..\bin\join -t "," "%TEMP%\OfficeFileIds.txt" "%TEMP%\UpdateCabExeIdsAndLocations.txt" > "%TEMP%\OfficeUpdateCabExeIdsAndLocations.txt"

del "%TEMP%\OfficeFileIds.txt"
del "%TEMP%\UpdateCabExeIdsAndLocations.txt"

rem Revised field order
rem File 1: OfficeUpdateCabExeIdsAndLocations.txt
rem  - Field 1.1: FileId
rem  - Field 1.2: Location (URL)
rem File 2: OfficeUpdateAndFileIds.txt
rem  - Field 2.1: FileId
rem  - Field 2.2: Bundle UpdateId

rem Write Locations only
..\bin\join -t "," -o "1.2" "%TEMP%\OfficeUpdateCabExeIdsAndLocations.txt" "%TEMP%\OfficeUpdateAndFileIds.txt" > "%TEMP%\DynamicDownloadLinks-%1-%2-Unsorted.txt"
..\bin\gsort --unique "%TEMP%\DynamicDownloadLinks-%1-%2-Unsorted.txt" > "%TEMP%\DynamicDownloadLinks-%1-%2.txt"
rem Write UpdateIds and Locations
..\bin\join -t "," -o "2.2,1.2" "%TEMP%\OfficeUpdateCabExeIdsAndLocations.txt" "%TEMP%\OfficeUpdateAndFileIds.txt" > "%TEMP%\UpdateTableURL-%1-%2.csv"

del "%TEMP%\OfficeUpdateAndFileIds.txt"
del "%TEMP%\OfficeUpdateCabExeIdsAndLocations.txt"
rem del "%TEMP%\DynamicDownloadLinks-%1-%2-Unsorted.txt"

if not exist ..\client\ofc\nul md ..\client\ofc
%CSCRIPT_PATH% //Nologo //E:vbs ExtractIdsAndFileNames.vbs "%TEMP%\UpdateTableURL-%1-%2.csv" ..\client\ofc\UpdateTable-%1-%2.csv
del "%TEMP%\UpdateTableURL-%1-%2.csv"

echo %TIME% - Done.
:ExcludeOffice
goto :eof


:NoExtensions
:DownloadError
:EoF
endlocal


This calculates the three files DynamicDownloadLinks-ofc-glb.txt, DynamicDownloadLinks-ofc-enu.txt, and DynamicDownloadLinks-ofc-deu.txt in a new record time.

Re: How to calculate ofc updates 20x faster

PostPosted: 28.01.2016, 22:17
by WSUSUpdateAdmin
Hi Hartmut,

wow, that's brilliant! :D

Since I'm not familiar with the Unix commands, I've never noticed "join", which is beautiful for the job. :)

I prefer UnxUtils (http://sourceforge.net/projects/unxutils/) because of less (DLL) dependencies.

Also, I additionally halved the time consumption for the first loop.

So we got "old-fashioned":
Code: Select all
18:29:38,50 - Point 1
18:33:02,05 - Point 2
18:33:11,41 - Point 3
18:33:12,79 - Point 4
18:33:12,80 - Point 2
18:33:21,54 - Point 3
18:33:21,66 - Point 4
18:33:21,68 - Point 2
18:33:43,47 - Point 3
18:37:28,35 - Point 4

...whereas "new-style":
Code: Select all
18:37:58,77 - Point 1
18:41:37,40 - Point 2
18:41:43,39 - Point 3
18:41:43,42 - Point 4
18:41:43,45 - Point 2
18:41:48,69 - Point 3
18:41:48,71 - Point 4
18:41:48,74 - Point 2
18:41:58,80 - Point 3
18:41:59,13 - Point 4

That's an improvement! :)

Everything will soon be checked in by r724.

Many, many thanks & kind regards,
Torsten

Re: How to calculate ofc updates 20x faster

PostPosted: 29.01.2016, 01:47
by boco
Will that apply to all routines or only to ofc?

Re: How to calculate ofc updates 20x faster

PostPosted: 29.01.2016, 07:45
by WSUSUpdateAdmin
Hi!

It's an improvement for ofc calculation only.

OS calculation did not require that nested loop and therefore can't be improved this way, but it isn't that slow, is it?

Regards
Torsten

Re: How to calculate ofc updates 20x faster

PostPosted: 29.01.2016, 17:36
by hbuhrmester
Hallo,

sollte die Datei "%TEMP%\UpdateTable-%1-%2.csv" nicht eigentlich direkt im Verzeichnis ..\client\ofc angelegt werden? So wie in früheren Versionen des Skripts?

Code: Select all
if not exist ..\client\ofc\nul md ..\client\ofc
%CSCRIPT_PATH% //Nologo //E:vbs ExtractIdsAndFileNames.vbs "%TEMP%\UpdateTableURL-%1-%2.csv" ..\client\ofc\UpdateTable-%1-%2.csv
del "%TEMP%\UpdateTableURL-%1-%2.csv"


Diese Anweisungen hatte ich ja nicht verändert, sondern einfach kopiert, um den gesamten Abschnitt zwischen den Labels DetermineOffice und ExcludeOffice im Zusammenhang darzustellen.

Die Datei UpdateTable-%1-%2.csv wird letztlich vom Skript client\cmd\ListUpdatesToInstall.cmd verwendet, gehört also schon in client\ofc.

Re: How to calculate ofc updates 20x faster

PostPosted: 30.01.2016, 00:22
by boco
WSUSUpdateAdmin wrote:Hi!

It's an improvement for ofc calculation only.

OS calculation did not require that nested loop and therefore can't be improved this way, but it isn't that slow, is it?

Regards
Torsten
Nope, it was only a technical question.

The frustrating part is not the determination and gathering of Win patches. The real frustrating part is the installation on 7 or 8 (as you all know by now).

Re: How to calculate ofc updates 20x faster

PostPosted: 30.01.2016, 09:25
by WSUSUpdateAdmin
Moin!

hbuhrmester wrote:sollte die Datei "%TEMP%\UpdateTable-%1-%2.csv" nicht eigentlich direkt im Verzeichnis ..\client\ofc angelegt werden? So wie in früheren Versionen des Skripts? [...]


Ja, das ist ein blöder "copy-paste"-Fehler.
Ich hatte zum Testen zunächst in ExtractUpdateIdsAndDownloadLinks-ofc.cmd gearbeitet und den Block dann "vorsichtig" ;) übertragen.

Fixed.

Gruß
Torsten

Re: How to calculate ofc updates 20x faster

PostPosted: 30.01.2016, 20:20
by hbuhrmester
Hallo,

jetzt fehlt nur noch die Zeile:

Code: Select all
if not exist ..\client\ofc\nul md ..\client\ofc


Ich hatte kurz überlegt, ob das VBScript ExtractIdsAndFileNames.vbs den Ordner vielleicht selbst anlegt, doch das scheint nicht zu klappen:

Code: Select all
19:04:28,40 - Determining dynamic update urls for ofc enu (please be patient, this will take a while)...
C:\trunk-r727\cmd\ExtractIdsAndFileNames.vbs(54, 1) Laufzeitfehler in Microsoft VBScript: Der Pfad wurde nicht gefunden.

19:06:39,06 - Done.
Downloading/validating statically defined updates for ofc enu...
Downloading/validating dynamically determined updates for ofc enu...
Downloading/validating update 1 of 12...


Dann fehlt auch die Datei UpdateTable-ofc-enu.csv. Unmittelbar danach wird der Ordner ..\ofc\enu aber von wget angelegt. Das ist eine Eigenheit von wget, die es in anderen Downloadern auch nicht unbedingt gibt.

Re: How to calculate ofc updates 20x faster

PostPosted: 30.01.2016, 22:51
by WSUSUpdateAdmin
Wenn ich Euch nicht hätte... :roll:
Fixed.

Vielen Dank und einen schönen Sonntag,
Torsten

p.S.: Man sieht auch immer wieder, wie heikel Änderungen an manchen Stellen sind...

Re: How to calculate ofc updates 20x faster

PostPosted: 31.01.2016, 01:14
by boco
Deshalb sollte man auch nicht über jeden Windows-Bug (*) schimpfen: Die haben ein bißchen mehr Code zu stemmen...


(*) Läßt sich auf fast alle Software-Hersteller übertragen.