I have this format of the dataset in a text file.
Here the dataset link is https://drive.google.com/file/d/1RqU2s0dqjd60dcYlxEJ8vnw9_z2fWixd/view?usp=sharing
PMID- 20301691 STAT- Publisher DA - 20100320 DRDT- 20210311 CTDT- 20000204 PB - University of Washington, Seattle DP - 1993 TI - Classic Galactosemia and Clinical Variant Galactosemia BTI - GeneReviews((R)) AB - CLINICAL CHARACTERISTICS: The term "galactosemia" refers to disorders of galactose metabolism that include classic galactosemia, clinical variant galactosemia, and biochemical variant galactosemia (not covered in this chapter). This GeneReview focuses on: Classic galactosemia, which can result in life-threatening complications including feeding problems, failure to thrive, hepatocellular damage, bleeding, and E coli sepsis in untreated infants. If a lactose-restricted diet is provided during the first ten days of life, the neonatal signs usually quickly resolve and the complications of liver failure, sepsis, and neonatal death are prevented; however, despite adequate treatment from an early age, children with classic galactosemia remain at increased risk for developmental delays, speech problems (termed childhood apraxia of speech and dysarthria), and abnormalities of motor function. Almost all females with classic galactosemia manifest hypergonadatropic hypogonadism or premature ovarian insufficiency (POI). Clinical variant galactosemia, which can result in life-threatening complications including feeding problems, failure to thrive, hepatocellular damage including cirrhosis, and bleeding in untreated infants. This is exemplified by the disease that occurs in African Americans and native Africans in South Africa. Persons with clinical variant galactosemia may be missed with newborn screening as the hypergalactosemia is not as marked as in classic galactosemia and breath testing is normal. If a lactose-restricted diet is provided during the first ten days of life, the severe acute neonatal complications are usually prevented. African Americans with clinical variant galactosemia and adequate early treatment do not appear to be at risk for long-term complications, including POI. DIAGNOSIS/TESTING: The diagnosis of classic galactosemia and clinical variant galactosemia is established by detection of elevated erythrocyte galactose-1-phosphate concentration, reduced erythrocyte galactose-1-phosphate uridylyltranserase (GALT) enzyme activity, and/or biallelic pathogenic variants in GALT. In classic galactosemia, erythrocyte galactose-1-phosphate is usually >10 mg/dL and erythrocyte GALT enzyme activity is absent or barely detectable. In clinical variant galactosemia, erythrocyte GALT enzyme activity is close to or above 1% of control values but probably never >10%-15%. However, in African Americans with clinical variant galactosemia, the erythrocyte GALT enzyme activity may be absent or barely detectable but is often much higher in liver and in intestinal tissue (e.g., 10% of control values). Virtually 100% of infants with classic galactosemia or clinical variant galactosemia can be detected in newborn screening programs that include testing for galactosemia in their panel. However, infants with clinical variant galactosemia may be missed if the program only measures blood total galactose level and not erythrocyte GALT enzyme activity. MANAGEMENT: Treatment of manifestations: Standard of care in any newborn who is "screen-positive" for galactosemia is immediate dietary intervention while diagnostic testing is under way. Once a diagnosis is confirmed, restriction of galactose intake is continued and all milk products are replaced with lactose-free formulas (e.g., Isomil((R)) or Prosobee((R))) containing non-galactose carbohydrates; dietary restrictions on all lactose-containing foods and other dairy products should continue throughout life, although management of the diet becomes less important after infancy and early childhood. In rare instances, cataract surgery may be needed in the first year of life. Childhood apraxia of speech and dysarthria require expert speech therapy. Developmental assessment at age one year by a psychologist and/or developmental pediatrician is recommended in order to formulate a treatment plan with the speech therapist and treating physician. For school-age children, an individual education plan and/or professional help with learning skills and special classrooms as needed. Hormone replacement therapy as needed for delayed pubertal development and/or primary or secondary amenorrhea. Stimulation with follicle-stimulating hormone may be useful in producing ovulation in some women. Prevention of secondary complications: Recommended calcium, vitamin D, and vitamin K intake to help prevent decreased bone mineralization; standard treatment for gastrointestinal dysfunction. Surveillance: Biochemical genetics clinic visits every three months for the first year of life or as needed depending on the nature of the potential acute complications; every six months during the second year of life; yearly thereafter. Routine monitoring for: the accumulation of toxic analytes (e.g., erythrocyte galactose-1-phosphate and urinary galactitol); cataracts; speech and development; movement disorder; POI; nutritional deficiency; and osteoporosis. Agents/circumstances to avoid: Breast milk, proprietary infant formulas containing lactose, cow's milk, dairy products, and casein or whey-containing foods; medications with lactose and galactose. Evaluation of relatives at risk: To allow for earliest possible diagnosis and treatment of at-risk sibs: Perform prenatal diagnosis when the GALT pathogenic variants in the family are known; or If prenatal testing has not been performed, test the newborn for either the family-specific GALT pathogenic variants or erythrocyte GALT enzyme activity. Pregnancy management: Women with classic galactosemia should maintain a lactose-restricted diet during pregnancy. GENETIC COUNSELING: Classic galactosemia and clinical variant galactosemia are inherited in an autosomal recessive manner. Couples who have had one affected child have a 25% chance of having an affected child in each subsequent pregnancy. Molecular genetic carrier testing for at-risk sibs and prenatal testing for pregnancies at increased risk are an option if the GALT pathogenic variants in the family are known. If the GALT pathogenic variants in a family are not known, prenatal testing can rely on assay of GALT enzyme activity in cultured amniotic fluid cells. CI - Copyright (c) 1993-2021, University of Washington, Seattle. GeneReviews is a registered trademark of the University of Washington, Seattle. All rights reserved. FED - Adam, Margaret P ED - Adam MP FED - Ardinger, Holly H ED - Ardinger HH FED - Pagon, Roberta A
I want to give the left side value as column name and right side values will be a row format.
Output should be
PMID STAT DA CTDT 33237688 Publisher 20201126 20201125
I have tried with text to CSV but not working
import pandas as pd medical = pd.read_csv("sepsis2015.txt", sep="n") print(medical)
Advertisement
Answer
The simplest way I know is:
read data file with:
with open("sepsis2015.txt") as file: lines = file.readlines() lines = ''.join(lines).split('nn')
this will give you a list of your records:
['PMID- 20301691 nSTAT- PublishernDA - 20100320nDRDT- 20210311nCTDT- 20000204nPB - University of Washington, SeattlenDP - 1993nTI - Classic Galactosemia and Clinical Variant GalactosemianBTI - GeneReviews((R))', 'nPMID- 33237688nSTAT- PublishernDA - 20201126nCTDT- 20201125nPB - University of Washington, SeattlenDP - 1993nTI - MIRAGE SyndromenBTI - GeneReviews((R))']
convert data stored in
lines
list into adata
dictionary:data = {i: {item.split('-')[0].replace(' ', ''): item.split('-')[1][1:] for item in row.split('n') if '-' in item} for i, row in enumerate(lines)}
so you have:
{0: {'PMID': '20301691', 'STAT': 'Publisher', 'DA': '20100320', 'DRDT': '20210311', 'CTDT': '20000204', 'PB': 'University of Washington, Seattle', 'DP': '1993', 'TI': 'Classic Galactosemia and Clinical Variant Galactosemia', 'BTI': 'GeneReviews((R))'}, 1: {'PMID': '33237688', 'STAT': 'Publisher', 'DA': '20201126', 'CTDT': '20201125', 'PB': 'University of Washington, Seattle', 'DP': '1993', 'TI': 'MIRAGE Syndrome', 'BTI': 'GeneReviews((R))'}}
finally, convert this dictionary into a
pandas.DataFrame
with:df = pd.DataFrame.from_dict(data, orient = 'index')
Complete Code
import pandas as pd with open(r'data/data.csv') as file: lines = file.readlines() lines = ''.join(lines).split('nn') data = {i: {item.split('-')[0].replace(' ', ''): item.split('-')[1][1:] for item in row.split('n') if '-' in item} for i, row in enumerate(lines)} print(data) df = pd.DataFrame.from_dict(data, orient = 'index')
PMID STAT DA DRDT CTDT PB DP TI BTI 0 20301691 Publisher 20100320 20210311 20000204 University of Washington, Seattle 1993 Classic Galactosemia and Clinical Variant Galactosemia GeneReviews((R)) 1 33237688 Publisher 20201126 NaN 20201125 University of Washington, Seattle 1993 MIRAGE Syndrome GeneReviews((R))