I cannot seem to be able to directly import mssql spatial fields into geopandas. I can import normal mssql tables into pandas with Pymssql without problems, but I cannot figure out a way to import the spatial fields into shapely geometry. I know that the OGR driver for mssql should be able to handle it, but I’m not skilled enough in sql to figure this out. This is more of an issue for lines and polygons as points can be converted to x and y coordinates from the mssql field. Thanks!
Advertisement
Answer
I figured it out by properly querying the sql database table and converting the wkt string to shapely geometry via the loads function in shapely.wkt.
I’m no programmer, so bear that in mind with the organization of the function. The function can import mssql tables with or without GIS geometry.
from pymssql import connect from pandas import read_sql from shapely.wkt import loads from geopandas import GeoDataFrame def rd_sql(server, database, table, col_names=None, where_col=None, where_val=None, geo_col=False, epsg=2193, export=False, path='save.csv'): """ Imports data from MSSQL database, returns GeoDataFrame. Specific columns can be selected and specific queries within columns can be selected. Requires the pymssql package, which must be separately installed. Arguments: server -- The server name (str). e.g.: 'SQL2012PROD03' database -- The specific database within the server (str). e.g.: 'LowFlows' table -- The specific table within the database (str). e.g.: 'LowFlowSiteRestrictionDaily' col_names -- The column names that should be retrieved (list). e.g.: ['SiteID', 'BandNo', 'RecordNo'] where_col -- The sql statement related to a specific column for selection (must be formated according to the example). e.g.: 'SnapshotType' where_val -- The WHERE query values for the where_col (list). e.g. ['value1', 'value2'] geo_col -- Is there a geometry column in the table? epsg -- The coordinate system (int) export -- Should the data be exported path -- The path and csv name for the export if 'export' is True (str) """ if col_names is None and where_col is None: stmt1 = 'SELECT * FROM ' + table elif where_col is None: stmt1 = 'SELECT ' + str(col_names).replace(''', '"')[1:-1] + ' FROM ' + table else: stmt1 = 'SELECT ' + str(col_names).replace(''', '"')[1:-1] + ' FROM ' + table + ' WHERE ' + str([where_col]).replace(''', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')' conn = connect(server, database=database) df = read_sql(stmt1, conn) ## Read in geometry if required if geo_col: geo_col_stmt = "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME=" + "'" + table + "'" + " AND DATA_TYPE='geometry'" geo_col = str(read_sql(geo_col_stmt, conn).iloc[0,0]) if where_col is None: stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table else: stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table + ' WHERE ' + str([where_col]).replace(''', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')' df2 = read_sql(stmt2, conn) df2.columns = ['geometry'] geometry = [loads(x) for x in df2.geometry] df = GeoDataFrame(df, geometry=geometry, crs={'init' :'epsg:' + str(epsg)}) if export: df.to_csv(path, index=False) conn.close() return(df)
EDIT: Made the function automatically find the geometry field if one exists.