This snippet of code shows how to do Synchronisation or HTML tag replacement

-My problem is i want to synchronise of tags between 2 HTML document

Issue

-Source.htm

my source.htm has this tag

<xml id=STATIC_CRITERIA_>

  <fields>

    <field fieldID=OPEN_FLAG displayType=checkboxGroup searchType=checkboxGroup underlyingType=int />

    <field fieldID=PARTITION displayType=dropdown searchType=profile profileID=SU_PARTITIONS underlyingType=int />

    <field fieldID=TEMPLATE_IND displayType=checkbox searchType=checkbox underlyingType=int />

    <field fieldID=INCLUDE_DELETED displayType=checkbox searchType=checkbox underlyingType=string />

  </fields>

</xml>

 -Target.htm has this tag

<xml id=STATIC_CRITERIA_>

</xml>

So now the problem is how do I fill the gap in XML tag in target.htm with the value from mySource.htm. We can do this using regex and it’s very simple

 

  string originalDocument = Load(“c:\\MySource.htm”);

                string syncDocument = Load(“c:\\Target.htm”);

 

                MatchCollection mc = Regex.Matches(originalDocument, “<xml([ ]?.*?)>(.*?)</xml>”, RegexOptions.Singleline | RegexOptions.IgnoreCase);

 

                foreach (Match m in mc)

                {

                    string token = “<xml{0}>”;

                    syncDocument = Regex.Replace(syncDocument, String.Format(token, m.Groups[1].Value) + “(.*?)</xml>”,

                                                               String.Format(token, m.Groups[1].Value) + m.Groups[2].Value + “</xml>”,

                                                               RegexOptions.Singleline | RegexOptions.IgnoreCase);

                }

 MatchCollection is used to return all the instances of that regular expression (e.g you might have multiple XML tags with different ID)

What does this tag means

“<xml([ ]?.*?)>(.*?)</xml>”

 ([ ]?.*) means that I don’t care whatever after it (e.g it can be ID or attributes etc). this is the first parameter that we stored on variable

(.*?) means that whatever inside the tag are captured into the second parameter that we stored on variable

You can access the first variable (e.g any attributes after the XML) by doing

m.Groups[1].Value

you can access the value inside the xml bracket by using

m.Groups[2].Value

so what  contains in

m.Groups[0].Value

it contains the whole xml tag that we extracted using regex

 then we can use regex.replace method to replace the tag in target html. When you replace the tag you need to replace the whole xml tag. You can’t just replace the inner part of it

   foreach (Match m in mc)

                {

                    string token = “<xml{0}>”;

                    syncDocument = Regex.Replace(syncDocument, String.Format(token, m.Groups[1].Value) + “(.*?)</xml>”,

                                                               String.Format(token, m.Groups[1].Value) + m.Groups[2].Value + “</xml>”,

                                                               RegexOptions.Singleline | RegexOptions.IgnoreCase);

                }