Sarah Maurer asks: I’m using a GC/MS to analyze some samples, but the ThermoFischer XCalibur software on the instrument only allows us to export the data one CSV at a time, which is an error prone process. Is there any way to automate this? Here are some notes on how to do this in 2023…

Using MSFileReader and the PyPiWin32 (not recommended)

tl;dr—this can do the trick and explains many things, but relies on deprecated software and is a bit fiddly. This is not the way

I found a thorough (70 minute) YouTube video describining how to do this

The video mentions mzXML

But instead prefers to use Thermo’s MSFileReader software (reference manual) There’s a download link provided in the youtube video comments, but you need a Thermo registration.

However, the MSFileReader library is a C++ library. To use it in python use pypiwin32 (python -m pip install pypiwin32).

Sample code begins at 13:32…and then switches to Python2.7—not sure if this is a problem with PyPiWin32 or with the auteur’s python installation environment.

But then he walks through getting the spectra in a RAW file and illustrates many processes.

RawFileReader + RawQuant/RawTools (recommended)

tl;dr—This appears to be some convenience wrappers around RawTools that should do what we want without programming This is the way! alas…with C#…but there appears to be some command line programs and GUIs that do most of what we want to accomplish

According to the internets, MSFileReader, the underlying library used by these bindings, is outdated and buggy. Thermo now recommends to use RawFileReader to read Thermo RAW files.

There is a python binding RawQuant for which development stopped in January 2019.

The authors have instead put their effort into RawTools a C# library which they claim is faster. They also have a paper about it in J. Proteom Res 2019 and it should work on any operating system. It appears to be actively developed through May 2022.

Note that RawTools appears to be a standalone tool with a GUI, so it can probably be used as is withought having to do any programming for what we want. This is probably the best approach.

RawFileReader (?)

tl;dr–This is ThermoFischer’s officially released C#-based parser for RAW files, which is intended to replace MSFileReader. It appears to be active (updated yesterday!) This is a lower level than RawTools above?

RawFileReader is a group of .Net Assemblies written in C# used to read Thermo Scientific RAW files. The assemblies can be used to read RAW files on Windows, Linux, and MacOS using C# or other languages that can acces a .Net assembly.

Other resources found in my readings

  • pyMSFileReader Python bindings for MSFileReader (tested on versions 3.0SP2 (August 2014) and 3.0SP3.) However the author notes that these are deprecated, as discussed int he previous section
  • mzXML n open data format for storage and exchange of mass spectroscopy data, developed at the SPC/Institute for Systems Biology. mzXML provides a standard container for ms and ms/ms proteomics data and is the foundation of our proteomic pipelines. Raw, proprietary file formats from most vendors can be converted to the open mzXML format.
  • ReAdW Thermo Xcalibur .raw files to mzXML converter command line program
  • However, based on a github pager ReAdW suggests that it requires MsFileReader_x64.exe Helpfull it provides a link to Thermo’s github for that.
  • XCaliburMethodReader — A simple command-line program for mass spectrometry researchers that extracts and converts data from Thermo XCalibur Method (.meth) files.
  • Proteowizard Toolkit: 2012 paper, 2017 paper, another 2017 paper

Continued readings and gleanings

(updated 20 Jan 2023)

  • Haas CP, Lübbesmeyer M, Jin EH, McDonald MA, Koscher BA, Guimond N, et al. Open-Source Chromatographic Data Analysis for Reaction Optimization and Screening. ChemRxiv 2022. doi:10.26434/chemrxiv-2022-0pv2d — describes a python library for reading and processing and extracting information from a variety of HPLC machines/vendors