InfoPathI recently completed a project based on InfoPath 2007 (Office client version) and Microsoft Office SharePoint Services 2007 (MOSS 2007). Looking back I can say that InfoPath has its uses, but before you build a solution around it you have to be very sure about its limitations. InfoPath has a number of limitations, especially with regards to submitting data, that aren’t that apparent at first sight. If you don’t watch out, you can quickly get caught up in what feels like a never ending maize of dead ends.

InfoPath is often pitched as a solution that doesn’t require writing custom code. This project was no different, and its time lines were made accordingly. In the end we had to write a fair amount of custom code, which was fun, but took more time.

The Many Limitations Of InfoPath DataBase Data Source

InfoPath generally works well viewing standard enterprise data sources such as database tables or SharePoint lists. The limitations become apparent when you attempt to submit to a database using an InfoPath SQL connection, or perform advanced queries. There are a number of limitations when you work with an InfoPath SQL database data source/connection:

  1. Only submit to a single table. This excludes database data sources such as views, and queries with joins. You cannot submit to views or SQL DataSources with joins.
  2. To submit to a database you can only use the main data connection. In other words you can’t have a database-view as the main data source, and setup another simple single table select to submit to.
  3. Range queries are not possible. You can only use a field once in a query’s WHERE clause with an equality operator.
  4. SQL data source dependent on table schema. If a SQL data source’s underlying table is modified, even just adding a column (in other words InfoPath’s SELECT statement doesn’t actually change), the data source will break.
Data Connection Wizard

Data Connection Wizard

With SharePoint lists you cannot query the data source with queryfields like relational data sources.

The above limitations, especially regarding relational data sources, mean one thing: Web services are mandatory for working with your relational data. Using web services allows you to overcome all the limitations of the standard InfoPath SQL data source, and work with a consistent schema.

Another thing to watch out for is that InfoPath’s performance deteriorates quickly when you have more than 50 rows in your result set. Sometimes this figure is much lower. In the project I worked on the data was coming lightning fast from the data base through the web service. But when the data hits the form, and InfoPath starts parsing the XML document, it completely froze for quite a while. I have decided not to torture myself trying to page my form data, so I haven’t looked into this yet (and I believe InfoPath is not meant to be used in this manner). The quickest and most effective solution I could come up with is to allow users to load data into their form incrementally. How this works is that you’ll do a normal retrieve of your data from the data source, but instead of clearing the form, you’ll just add the new result set to the rest of the form’s data. The big drawback of this is that you need to write custom code to modify the XML document directly using XmlWriter: Not a too pleasant exercise.

public void Load_Clicked(object sender, ClickedEventArgs e)
{
  // Call the web service of the secondary DataSource, which will populate it         
  DataSources["ClientWS2"].QueryConnection.Execute();
  var clients = DataSources["ClientWS2"].CreateNavigator().Select("/dfs:myFields/dfs:dataFields/tns:GetClientsResponse/tns:GetClientsResult/tns:Client", NamespaceManager);

  // The 1st time rows are added GetClientsResult might not exist, only GetClientsResponse
  var main = MainDataSource.CreateNavigator().SelectSingleNode("/dfs:myFields/dfs:dataFields/tns:GetClientsResponse/tns:GetClientsResult", NamespaceManager);
  if (main == null) main = MainDataSource.CreateNavigator().SelectSingleNode("/dfs:myFields/dfs:dataFields/tns:GetClientsResponse", NamespaceManager);

  using (XmlWriter writer = main.AppendChild())
  {
    // Make sure we are adding Client elements to /dfs:myFields/dfs:dataFields/tns:GetClientsResponse/tns:GetClientsResult and not, /dfs:myFields/dfs:dataFields/tns:GetClientsResponse
    if (main.LocalName == "GetClientsResponse")
    {
      // So if it doesn't exist, create it first
      writer.WriteStartElement("GetClientsResult", "http://sh.inobido.com/CRM/Service");
    }

    while (clients.MoveNext())
    {
      writer.WriteStartElement("Client", "http://sh.inobido.com/CRM/Service");

      // Select all the client element's child elements
      var fields = proposals.Current.Select("*", NamespaceManager);
      while (fields.MoveNext())
      {
        // Write each element and value to the Main DataSource
        writer.WriteStartElement(fields.Current.Name, "http://sh.inobido.com/CRM/Service");
        writer.WriteString(fields.Current.Value);
        writer.WriteEndElement();
      }

      writer.WriteEndElement();
    }

    if (main.LocalName == "GetClientsResponse)
    {
      writer.WriteEndElement();
    }
    writer.Close();
  }
}

The above event fires when a user clicks the Load button. The trick to load data incrementally is that you need a second DataSource exactly the same as the Main DataSource (they should point to the same data store). Whenever you call DataSource.QueryConnection.Execute() InfoPath will wipe any previous data from that DataSource, and reload it with the new data. That’s why you need a separate second DataSource that you call Execute on, and then copy that data to the Main DataSource. The end result is the Main DataSource doesn’t lose its data, but data gets added to it on each query.

Just another side note on InfoPath: Pivot tables are not possible, because you have to know exactly which columns your binding to at design time, and cannot create columns dynamically at runtime. This shouldn’t be a show stopper to most projects, but I’m just mentioning it. All the InfoPath forms we had to do came from Excel spreadsheets, and the one spreadsheet was a monster pivot table.

Hacking The DataConnection

It’s possible to query a data connection directly from InfoPath, change the SQL command dynamically, or extract the connection string. The biggest drawback of this hack (apart from being a hack, i.e. not recommended) is that it requires FullTrust and Sql Code Access Security (CAS) permissions. That means you have to certify your InfoPath form, or create an installer so users have to install it locally onto their machines. This doesn’t really work well when the form is made available to users through a SharePoint document library.

Anyways, here is a very unrefined sample to achieve this:

private const string CONNECTION_STRING = "Server={0};Database={1};User ID={2};Password={3};Trusted_Connection=False;";

private string GetConnectionString(AdoQueryConnection queryConnection)
{
  var password = GetConnectionStringParameter(queryConnection, "Password");
  var user = GetConnectionStringParameter(queryConnection, "User ID");
  var server = GetConnectionStringParameter(queryConnection, "Data Source");
  var db = GetConnectionStringParameter(queryConnection, "Initial Catalog");
  return string.Format(CONNECTION_STRING, server, db, user, password);
}

// Hmmm, if your using my wonderful hack, then you might want to
// consider rewriting this method to use regular expressions instead ;-)
private string GetConnectionStringParameter(AdoQueryConnection queryConnection, string name)
{
  var paramIndex = queryConnection.Connection.IndexOf(name + "=");
  var parameter = queryConnection.Connection.Substring(paramIndex, queryConnection.Connection.IndexOf(";", paramIndex) - paramIndex);
  return parameter.Substring(parameter.IndexOf("=") + 1);
}

private IDataReader SelectWorksheetItems(SqlConnection connection, int pocketID)
{
  using (var dbCommand = new SqlCommand("WorksheetItemGetByPocket", connection))
  {
    connection.Open();
    dbCommand.Parameters.Add("@pocketID", SqlDbType.Int).Value = pocketID;
    dbCommand.CommandType = CommandType.StoredProcedure;
    return dbCommand.ExecuteReader();
  }
}

worksheetItemCurrentDS = DataSources["WorksheetItemCurrent"];
worksheetItemCurrentCmd = ((AdoQueryConnection)worksheetItemCurrentDS.QueryConnection).Command + " where \"PocketID\"={0}";

using (var connection = new SqlConnection(GetConnectionString((AdoQueryConnection)worksheetItemCurrentDS.QueryConnection)))
{
  using (var reader = SelectWorksheetItems(connection, 24))
  {
    // Do some stuff with the DataReader here...
  }
}

InfoPath’s different data sources each use a specific data connection that inherits from the abstract class Microsoft.Office.InfoPath.DataConnection. The main point of the above example is that you can cast your InfoPath’s DataConnections to its specific implementation. For SQL database data sources InfoPath uses AdoQueryConnection. With AdoQueryConnection you have the ability to extract or manipulate the data source’s command and connection string.

Using SQL Server To Store InfoPath Documents

You cannot call store procedures directly from InfoPath, but if you develop on SQL Server 2005 or later, you can use SQL Web Services to call stored procedures as a web service. The big catch here is that SQL authentication and SQL Web Services don’t really go well together. Therefore when using SQL authentication for your InfoPath DataConnections you will either have to support integrated authentication for calls coming through the SQL Web Service (and SQL authetication for direct calls from the InfoPath form), or you will have to throw open access to your stored procedure to all users. If your using SQL authentication, there’s usually a good reason your doing so, so additionally supporting integrated authentication might not be an option. Giving access to anyone is an even worse idea.

CREATE ENDPOINT ClientInsertEndpoint
STATE = STARTED
AS HTTP
(
  SITE = 'ServerName',
  PATH = '/WebServiceName',
  AUTHENTICATION = (NTLM),
  PORTS = (CLEAR)
)
FOR SOAP
(
  WEBMETHOD 'ClientInsert'
  (
    NAME = 'DataBase.dbo.ClientInsert',
    SCHEMA = DEFAULT,
    FORMAT = ROWSETS_ONLY
  ),
  WSDL = DEFAULT,
  BATCHES = DISABLED,
  DATABASE = 'DataBase'
)
GO

If is possible to support SQL authentication for SQL Server Web Services, but this requires a SSL server certificate. Microsoft also plans to remove this feature from SQL Server in future releases.

“This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.”

Definitely read Microsoft’s best practices for SQL Server Native XML Web Services.

Finally, you can use SQL Server 2005’s XML data type to save an InfoPath Form or query it in a relational format. Here’s a sample stored procedure that takes in the root node of the InfoPath’s XML document, and inserts the items into a table:

ALTER PROCEDURE [dbo].[ClientUpdate]
(
  @clientsXml XML
)
AS
  INSERT INTO  Client (FirstName,
               LastName,
               CellNo,
               TelNo,
               WorkNo)
SELECT         Clients.Client.query('data(@FirstName)').value('.', 'VARCHAR(25)') as FirstName,
               Clients.Client.query('data(@LastName)').value('.', 'VARCHAR(25)') as LastName,
               Clients.Client.query('data(@CellNo)').value('.', 'VARCHAR(25)') as CellNo,
               Clients.Client.query('data(@TelNo)').value('.', 'VARCHAR(25)') as TelNo,
               Clients.Client.query('data(@WorkNo)').value('.', 'VARCHAR(25)') as WorkNo
FROM           @clientsXml.nodes('declare namespace d="http://schemas.microsoft.com/office/infopath/2003/ado/dataFields namespace dfs="http://schemas.microsoft.com/office/infopath/2003/dataFormSolution"; //d:Client') Clients(Client)

The new SQL XML syntax is a bit tricky, but once you get it right it works wonderfully well.

XPath Expressions Are Your Friend

Conditional Format

Conditional Format

Conditional formatting and XPath expressions are very handy to display unique values in a RepeatingTable. For instance say you’ve got a Client object, with multiple Addresses – street, postal, and work. Say you only wanted to show a client’s name once, and list each of his addresses without repeating his name. When you’re using a SQL DataSource, you will do a left join with the address table from the client table. This means you’ll repeat the same client name for each address.

To solve the aforementioned you need to make sure you order by client name, and then hide the textbox with a XPath expression:

tns:ClientName = preceding-sibling::tns:Client/tns:ClientName

What this expression is saying is that if the current Client’s ClientName is the same as the previous Client’s, then do something. That something is the action you’ll check on the Conditional Formatting window, that will be “hide” in our case.

This approach can be extended to multiple fields. All you have to do is make sure your order sequence is correct. So just by ordering your resultset corrrectly and using the right XPath expression, you’ll achieve quite a bit without having to write code.

InfoPath And Visual SourceSafe Does Not Play Well Together

If you’re creating InfoPath forms with Visual Studio Tools for Office and using Visual SourceSafe for source control, you will quickly get a whole range of different and meaningless error messages. Here are the general things to do to resolve them:

  1. Make sure all files in your Visual Studio InfoPath project’s “InfoPath Form Template” directory are checked out, before doing any work on manifest.xsf (the InfoPath form).
  2. If you’re having trouble checking files out of SourceSafe, from within Visual Studio:a. Close Visual Studio.b. Open the Visual SourceSafe application, and check out the files for the project from there. Once you’ve done this you can close Visual SourceSafe.c. Make the directory “InfoPath Form Template” and all its content writeable, by unchecking the Read-only option from the folder’s Options.d. Reopen Visual Studio, and continue working as usual.

Most Annoying InfoPath Deployment

Another aspect of InfoPath you need to consider is how you’re deploying your forms. To deploy a form you need to manually, that’s right manually, update each DataConnection to point to your production environment. Ouch! If you have say 5 forms, with 5 DataConnections each, then your looking at 25 DataConnections to manually update. Nasty! And if you mess one connection up, you’ve got a problem.

SQL DataConnections are the worst to update. When you want to change to a new DataBase, InfoPath completely clears your original select statement and forgets which table you were using, and you have to reselect the columns/table. Should your DataSource’s schema change (i.e. your select statement is not exactly the same as previously), InfoPath will do you the favor of removing your controls’ databindings. Most of the time you’ll probably use all the table’s columns, but you still have to go and re-select that table.

Web Service DataConnections are the easiest to reconfigure (but still pretty painful). You can just take the URL of the new web service, and copy it into the web service address box, and quickly click Next through the DataConnection wizard. InfoPath doesn’t forget which web method your DataConnection uses, like it does with SQL table DataConnections.

The aforementioned makes it extremely time consuming and error prone to deploy InfoPath forms between development, QA and production environments.

Conclusion

  1. Use web services to retrieve and save form data, and plan accordingly. I cannot state this enough. Yes, maybe for the simplest of simple forms you can get away with using InfoPath’s SQL DataConnection (and I mean really simple), but for everything else a web service is an absolute must.
  2. Try to avoid large editable, repeating data grids (or referred to as a RepeatingTable in InfoPath lingo). Be extra cautious when you’re planning on editing large result sets, with lots of drop down lists and lookup data. Forms that work best are ones that displays and edits a single entity, and apposed to forms that edit multiple instances of an entity on the same form.
  3. Don’t think you’re going to deploy those 5 forms in a few minutes. Give yourself enough time for the deployment, and to test each form afterwards to check that you didn’t mess a DataConnection up.
  4. Do a quick prototype of your forms to check whether InfoPath can really handle it. In my case the person who recommended InfoPath for the solution should have checked that it can accommodate pivot tables. This is general good software dev practice, but I think because of all InfoPath’s restrictions, I think one needs to be particularly careful.We’ve gotten so used to having control over every element of the user interface with ASP .NET and Windows Forms that we expect the same of other technologies we use. Remember InfoPath’s controls and their behaviour dictate how your information is displayed. You do not have access to the underlying API that these controls are based on. In other words know what InfoPath’s UI controls can do, because you won’t be able to write your own.

Bioinformatics Comes To .NET

December 23, 2008

My interest in molecular biology grows with each issue of Scientific American and each Technology Review post I read.  For a C# and Java developer, such as myself, Bioinformatics is a natural progression from programming into cell biology. I’m also a big open source proponent. So what do you get when you throw all these together? An open source project that brings the popular BioJava bioinformatics library to .NET, where I’m the latest team addition. BioSharp is the first open source project that I am actively working on.

BioSharp was started by Doug Swisher out of curiosity about how BioJava was implemented. Since .NET lacked an equivalent framework, Doug decided to create a port of BioJava in C# to aid his learning.

Currently I’m converting the classes from the org.biojava.ontology package to the BioSharp.Core.Ontology gene ontology namespace. As a non-native English speaker, the term ontology was new to me, and immediately caught my attention. So I spontaneously started working  on BioSharp’s gene ontology.

It turns out implementing the gene ontology library requires more knowledge about parsing files and building hierarchies of related terms (which could be any term not just biological terms) than knowledge about genes and cells. This suits me very well at this early stage, since my knowledge of cellular biology is extremely limited. I’m definitely working on BioSharp out of enthusiasm for molecular biology, rather than being an expert at anything biology related.

Business Day had a brief article where Peter Waker, national Computer Olympiad manager claims that university computer science departments are not getting enough students to satisfy the industry’s needs. The Computer Olympiad is a national competition to grow programming skills among high school pupils. Peter does not give any figures to substantiate his claim.

Personally as a software developer, I am encoruaged by his comment, as this means at least in some parts the IT industry is still growing. I am however concerned that this will not last very long, with the global economy teetering on a recession, and possibly a depression.

My personal experience is that demand has definitely come down in the past 12 months. Before then I would easily get a few requests every month from people asking whether I know any developers with some experience. Let’s hope the IT industry proves its resilience during the coming trying times, and that that perceived supply-demand mismatch is in fact a reality and gives us enough of a buffer to sustain us through the downturn.

Alt.Net South Africa

August 9, 2008

A local South African Alt.Net community was formed with a portal and e-mailing list: http://altdotnet.org.za and http://tech.groups.yahoo.com/group/altdotnet-za/ .

Alt.Net is a passionate movement that pushes the boundaries of software development by exploring beyond the mainstream and status quo, for new and better ways of building software, with a focus on the techniques, experience, principles, and patterns behind the tools.

The purpose of this site is to bring together South African developers who are following the international Alt.Net discussions. The idea is not to replace or duplicate the international community but to create a local space where we can collaborate to discuss the things that affect us in a Southern African (and African) context.

If you are passionate about getting this underway then please check it out and lets start a discussion on the mailing list and see where we can take it!

If you haven’t already heard of ZA Tech Show, now is the time. It’s an extremely cool podcast of the latest Tech Talk in South Africa.