Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Stop making mediocre tutorials.The best tutorials are video! Camtasia Studio makes it easy to create engaging, buzz-building screen videos at any size, in any popular format. Download the free trial!
  #1  
Old April 27th, 2008, 06:41 AM
sa125 sa125 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2008
Posts: 8 sa125 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 27 m 41 sec
Reputation Power: 0
Large File estimation and list/memory optimization

Hi all -

I'm trying to process very large csv (300+ MB) files and actually have two questions -

1. Is there a way to accurately estimate (within 5% or so) the number of rows in the file without reading it first? I came up with a primitive way of doing it by estimating the average size (in bytes) of a row, and then dividing the file size by that of the average row. This gives me about 85-90% accuracy, but I'd like to do better. Any thoughts will be great.

2. The file contains repeating items that change over time, and since I'd just like the first and last entry of each item, I have 2 dictionaries - ITEM_FIRST and ITEM_LAST that contain that respective data. The keys in those dictionaries are a string that is built from "<item> - <time>" (with values, of course, like "bannana - 10:30"), and my methodology is to check if an item/time combo is in the ITEM_FIRST/LAST.keys() and then adding it if it's not. The problem is that the list of keys grows to a big number, and the processing really slows down. Is there a way to speed things up or make it more efficient?

Thanks!

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Large File estimation and list/memory optimization


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

 Free IT White Papers!
 
Accelerating Trading Partner Performance
One in five. That's how many partner transactions have at least one error. That is an amazing statistic, particularly given the extraordinary leaps in innovation across the global supply chain during the past two decades. Download this white paper to learn more.

 
Competing on Analytics
This Tech Analysis is designed to help identify characteristics shared by analytics competitors, and includes information about 32 organizations that have made a commitment to quantitative, fact-based analysis.

 
Cost Effective Scaling with Virtualization and Coyote Point Systems
An overview of the industry trend toward virtualization, how server consolidation has increased the importance of application uptime and the steps being taken to integrate load balancing technology with virtualized servers.

 
Five Checkpoints to Implementing IP Telephony
Implementation planning for IP PBX software and IP telephony has become vital as businesses replace discontinued legacy PBX phone systems. This informative whitepaper outlines five "checkpoints" for any implementation plan that will help make IP communications a successful proposition.

 
Hosted Email Security: Staying Ahead of New Threats
In the last two years, email has become a fierce battleground between the nefarious forces of spam and malware, and the heroes of messaging protection. The spam volumes increased alarmingly every month, bringing clever new forms of phishing and virus propagation attacks.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway