Wednesday 15 May 2013

c# - Parallelize XML Reading gone wrong -


I work with large XML files (~ 2g), so far, read This was done by:

  private void readParameters (XmlReader m, measurement) {while (m.ReadToFollowing ("PAR")) {XmlReader par = m.ReadSubtree (); ReadParameter (equal, measurement); Par.Close (); (IDisposable par) .Dispose (); }}  

Which went well, but was slooooow. So I bring my science, tried to parallel reading:

  Private zero reading parameters (XmlReader m, measurement measurement) {list & lt; XmlReader & gt; Readers = new list & lt; XmlReader & gt; (); While (m.ReadToFollowing ("PAR")) {readers.Add (m.ReadSubtree ()); } Parallel. Foreike (reader, reader = & gt; {readParameter (reader, me); reader.close (); (IDICPoable) reader. Dispute ();}); }  

but it reads the same node at each frequency of foreach . How do I fix it? Is this a good way to parallel reading?

Because, as: / P>

ReadSubtree only on element nodes It can be said that when the entire sub-tree has been read, the Read method calls Return Falls. If the new XmlReader is turned off, then the original XmlReader sub-tree will be located on the EndElement node. Thus, if you call the ReadSubtree method on the start tag of the book element, the sub-tree has been read and the new XmlReader has been closed, the original XmlReader is located on the end tag of the book element. You should not do any work on the original XmlReader until the new XmlReader closes. This action is not supported and may result in unexpected behavior.

This method is not thread-safe, you can not keep ReadSubtree () "aside" and then use them later .

In general,

represents a reader that provides fast, noncached, forward-on access to XML data

Obviously you can not do whatever you want. In general, because the stream is using XmlReader , it can only be forwarded, so cloning will be required that stream "forked" ( For every clone of a XmlReader is caching the nodes "copy" (not guaranteed to be possible by stream ) or XmlReader ( As is not guaranteed by XmlReader ) as suggested by @MixZ, you can

  List & lt; XElement & gt; Element = New List & lt; XElement & gt; (); while (m.ReadToFollowing ("PAR")) {elements.Add (XElement.Load (m.ReadSubtree ()) );} Parallel.Forich (element, L => gt; {}};  

But I'm not sure this will change anything other than your memory usage (more than 2 GB See the archive :-)), because now the entire XML is done in parsing the "main" thread, and all the same elements are read in the XDocument object.

Or maybe you can try:

  Public Sealed Class Mayclass: IEnumerable & lt; XElement & gt; IDisposable {public readonly XmlReader reader; Public MyClass (XmlReader Reader) {Reader = Reader; } // Sealbird Class Public What Dispos () (reader.Dispose ();} Public Inimatorator & lt; XElement & gt; GetEnumerator () {While (Reader.ReadToFollowing ("PAR")) {yield returns XElement.Load (Reader .ReadSubtree ());}} System.Collection.INNUmeter System.Collection.IEnemerableGetInimerator {return GetNumerator ();}} Private static void readParameters (XmlReader m, measurement) {var enu = new MyClass (m); Parallel.ForEach (ANU, Reader = & gt; {// you work here});}  

Now Parallel.ForEach is a counters MyClass (sorry for the name: -)) which are freely load sub-streams.


No comments:

Post a Comment