Skip to content
Advertisement

Import mssql spatial fields into geopandas/shapely geometry

I cannot seem to be able to directly import mssql spatial fields into geopandas. I can import normal mssql tables into pandas with Pymssql without problems, but I cannot figure out a way to import the spatial fields into shapely geometry. I know that the OGR driver for mssql should be able to handle it, but I’m not skilled enough in sql to figure this out. This is more of an issue for lines and polygons as points can be converted to x and y coordinates from the mssql field. Thanks!

Advertisement

Answer

I figured it out by properly querying the sql database table and converting the wkt string to shapely geometry via the loads function in shapely.wkt.

I’m no programmer, so bear that in mind with the organization of the function. The function can import mssql tables with or without GIS geometry.

from pymssql import connect
from pandas import read_sql
from shapely.wkt import loads
from geopandas import GeoDataFrame

def rd_sql(server, database, table, col_names=None, where_col=None, where_val=None, geo_col=False, epsg=2193, export=False, path='save.csv'):
    """
    Imports data from MSSQL database, returns GeoDataFrame. Specific columns can be selected and specific queries within columns can be selected. Requires the pymssql package, which must be separately installed.
    Arguments:
    server -- The server name (str). e.g.: 'SQL2012PROD03'
    database -- The specific database within the server (str). e.g.: 'LowFlows'
    table -- The specific table within the database (str). e.g.: 'LowFlowSiteRestrictionDaily'
    col_names -- The column names that should be retrieved (list). e.g.: ['SiteID', 'BandNo', 'RecordNo']
    where_col -- The sql statement related to a specific column for selection (must be formated according to the example). e.g.: 'SnapshotType'
    where_val -- The WHERE query values for the where_col (list). e.g. ['value1', 'value2']
    geo_col -- Is there a geometry column in the table?
    epsg -- The coordinate system (int)
    export -- Should the data be exported
    path -- The path and csv name for the export if 'export' is True (str)
    """
    if col_names is None and where_col is None:
        stmt1 = 'SELECT * FROM ' + table
    elif where_col is None:
        stmt1 = 'SELECT ' + str(col_names).replace(''', '"')[1:-1] + ' FROM ' + table
    else:
        stmt1 = 'SELECT ' + str(col_names).replace(''', '"')[1:-1] + ' FROM ' + table + ' WHERE ' + str([where_col]).replace(''', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')'
    conn = connect(server, database=database)
    df = read_sql(stmt1, conn)

    ## Read in geometry if required
    if geo_col:
        geo_col_stmt = "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME=" + "'" + table + "'" + " AND DATA_TYPE='geometry'"
        geo_col = str(read_sql(geo_col_stmt, conn).iloc[0,0])
        if where_col is None:
            stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table
        else:
            stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table + ' WHERE ' + str([where_col]).replace(''', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')'
        df2 = read_sql(stmt2, conn)
        df2.columns = ['geometry']
        geometry = [loads(x) for x in df2.geometry]
        df = GeoDataFrame(df, geometry=geometry, crs={'init' :'epsg:' + str(epsg)})

    if export:
        df.to_csv(path, index=False)

    conn.close()
    return(df)

EDIT: Made the function automatically find the geometry field if one exists.

Advertisement